An important problem facing decision makers is learning, by trial and error, which decisions to make, so as best to obtain a reward or to avoid punishment. In computer science ?, this problem is known as reinforcement learning. Let’s take a closer look.
Dopamine is one of the brain’s eight neurotransmitters. A neurotransmitter is a chemical that carries information back and forth between neurons.??
The phasic activity of the midbrain dopamine neurons provides a global mechanism ? for synaptic modification. These synaptic modifications, in turn, provide the mechanistic underpinning for a specific class of reinforcement learning mechanisms that are now believed to underlie much of human and animal behavior??? ???.
In other words, dopamine allows us to see a reward and take action. We then take an action to receive rewards— that is, through the use of regulating movement, learning, attention and emotional responses. It’s that simple.
In the context of learning, dopamine functions as a reward prediction error signal. Put simply, dopamine calculates the difference between the reward that was expected and the reward that was actually received.
One of the most fundamental goals of any human ? is to make accurate predictions of future events in order to be prepared when this expected future arrives, and, in turn, to adapt their behavior accordingly.
Generally speaking, learning can be defined as the process of improving these predictions of the future. Because the predictions are often not quite accurate, we need a way to calculate our prediction error so we don’t make the same mistakes again (hence Reward Prediction Error) ?.
These prediction errors are one of the most basic teaching signals that can be used to improve prediction accuracy for future rewards. To that end, the ultimate goal of learning is to make accurate predictions, thus eliminating the prediction error.
The “prediction error” hypothesis is interesting because Reinforcement Learning algorithms use temporal difference learning ? which makes heavy use of a signal that encodes prediction error.
Temporal difference learning refers to a class of model-free reinforcement learning methods which learn by bootstrapping (using a combination of recent information and previous estimations to generate new estimations) from the current estimate of the value function.
In simple terms, reinforcement learning algorithms use prediction error to improve the computer’s ability to make better decisions in certain environments (e.g., while playing chess or pacman).
Current data suggests that the phasic activity of midbrain ? dopamine neurons encodes a reward prediction error used to guide learning throughout the frontal cortex and the basal ganglia. This activity is now believed to signal that a subject’s estimate of the value of current and future events is in error and then indicates the magnitude of this error.
This is a kind of combined signal that adjusts synaptic strengths in a quantitative manner until the subject’s estimate of the value of current and future events is accurately encoded in the frontal cortex and basal ganglia.
Altogether, this biological activity is now being encoded into many of our reinforcement learning algorithms and has achieved great success in many situations.
Dopamine neurons may provide neurons in the brain with detailed information about the value of the future. This information, in turn, could potentially be used to plan and execute profitable behaviors and decisions well in advance of actual reward occurrence ? and to learn about even earlier reliable predictors of reward.
Keeping this in mind, if we can teach our algorithms to do just that, our algorithms will continue growing stronger and smarter while inching towards a future where artificial general intelligence becomes a reality.✌️?