Explore to Exploit

May your choices reflect your hopes and not your fears –Nelson Mandela.

Personal Experience

I have always tried to walk these words in all my choices, be it simple choices such as selecting a flavor of ice-cream or difficult choices of life such as marriage. Let me continue with a simple choice of selecting a flavor of ice-cream. I have always liked chocolate flavor until I met my wife, when she humbly “enforced” me to try Butterscotch, and my choice changed. Point is, if you don’t explore you will be not be able to exploit or experience. In today’s AI world, where your every choice is continuously followed, to improve your experience of making CHOICES, sometimes I really wonder, is it taking away, my right or choice of exploring or trying new things.

What if we have an algorithm that lets you explore things, the way it happens in real life. The algorithm that lets you make mistakes, the algorithm that lets you explore and innovate. I believe this will make Artificial Intelligence more natural.

Nowadays jokingly we say, that if it is not on google then it does not exist. Today, I believe if you have an idea, and you google it, you will find there is someone already working on it. Same thing happened when I read the news in August, on Google publicizing their Dopamine library followed by Deepmind publicising Truffle library on reinforcement learning.

The algorithm helps you to EXPLORE and not only EXPLOIT.


Only few years back, I read the news of AI based bot, beating world’s number one player in AlphaGo and I believed, it won’t be late that the technology expands to day to day personal and business applications. The release of libraries from Google and DeepMind in recent months will enable the developers to commoditise it.


Let’s understand the concept of exploration from a retail case study of Recommendation systems.

Recommender systems exists to solve the problem of choosing one among plenty. Traditional Machine learning approaches use Collaborative filtering or content-based methods to recommend items. But reinforcement learning approach considers a recommender system as a Markov decision process (MDP) problem. Traditional techniques take historical positive cases (only those cases, where product was purchased) to build Content-based and Collaborative filtering models for Recommendations. Whereas DRL considers all the cases to explore the unknowns along with knowns. According to the authors of “Prediction Machines”, The biggest weakness of prediction machines is that they sometimes provide wrong answers that they are confident are right, which are nothing but the unknown Knowns. DRL can help you to explore them and overcome the weakness.

The algorithm is dynamic in nature and its continuously learning from the actions of the individuals/entities. Other Benefits of using RL over traditional system include overcoming cold-start problem, online-learning and explainable recommendations.

Relates to Human Psychology

Cognition one of the pillars of learning science from the field of human psychology, tells you that learning happens through feedback from environment. We keep on updating our predictions/reactions based on the feedback. This is analogous to Reinforcement learning models in the field of Machine Learning. Such continuous feedback loops are not part of Current Models. Reinforcement learning models are not limited to past experiences
 (exploitation) but are open to explore and receive feedbacks. To achieve human intelligence, current models need to adapt continuous learning across knowledge models around human senses and across domains

I am sure 2019 will see lot of momentum in making Artificial Intelligence more natural by exploring to exploit.


According to Werner Goertz, Research Director at Gartner reinforcement AI enabled robotics will be a major personal technologies trend by the end of 2019 and beyond.

“If you want something you’ve never had, you’ve got to do something you’ve never done before.” — Thomas Jefferson

read original article at https://medium.com/@iRahulKharat/explore-to-exploit-fc237b714588?source=rss——artificial_intelligence-5