4 votes

Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI

1 comment

  1. skybrian
    Link
    From the article: [...] [...] [...] Summary of the summary: having both optimistic and pessimistic neurons seems to be useful for both computers and mice.

    From the article:

    One of the algorithmic developments that has made reinforcement learning work better with neural networks is distributional reinforcement learning. In many situations (especially in the real world), the amount of future reward that will result from a particular action is not a perfectly known quantity, but instead involves some randomness.

    [...]

    Some predictors "amplify" or "overweight" their reward prediction errors (RPE) selectively when the reward prediction error is positive. This causes the predictor to learn a more optimistic reward prediction, corresponding to a higher part of the reward distribution. Other predictors amplify their negative reward prediction errors, and so learn more pessimistic predictions. All together, a set of predictors with a diverse set of pessimistic and optimistic weightings map out the full reward distribution.

    [...]

    In this work, we collaborated with an experimental lab at Harvard to analyse their recordings of dopamine cells in mice. The recordings were made while the mice performed a well-learned task in which they received rewards of unpredictable magnitude (indicated by the dice illustration in Figure 4). We evaluated whether the activity of dopamine neurons was more consistent with standard TD or distributional TD.

    [...]

    In summary, we found that dopamine neurons in the brain were each tuned to different levels of pessimism or optimism.

    Summary of the summary: having both optimistic and pessimistic neurons seems to be useful for both computers and mice.