Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time

Ian Cone; Claudia Clopath; Harel Z. Shouval

doi:10.1038/s41467-024-50205-3

Nature Communications (Jul 2024)

Learning to express reward prediction error-like dopaminergic activity requires plastic representations of time

Ian Cone,
Claudia Clopath,
Harel Z. Shouval

Affiliations

Ian Cone: Department of Bioengineering, Imperial College London
Claudia Clopath: Department of Bioengineering, Imperial College London
Harel Z. Shouval: Department of Neurobiology and Anatomy, University of Texas Medical School at Houston

DOI: https://doi.org/10.1038/s41467-024-50205-3
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 17

Abstract

Read online

Abstract The dominant theoretical framework to account for reinforcement learning in the brain is temporal difference learning (TD) learning, whereby certain units signal reward prediction errors (RPE). The TD algorithm has been traditionally mapped onto the dopaminergic system, as firing properties of dopamine neurons can resemble RPEs. However, certain predictions of TD learning are inconsistent with experimental results, and previous implementations of the algorithm have made unscalable assumptions regarding stimulus-specific fixed temporal bases. We propose an alternate framework to describe dopamine signaling in the brain, FLEX (Flexibly Learned Errors in Expected Reward). In FLEX, dopamine release is similar, but not identical to RPE, leading to predictions that contrast to those of TD. While FLEX itself is a general theoretical framework, we describe a specific, biophysically plausible implementation, the results of which are consistent with a preponderance of both existing and reanalyzed experimental data.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal