Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making

Vasiliki Liakoni; Marco P. Lehmann; Alireza Modirshanechi; Johanni Brea; Antoine Lutti; Wulfram Gerstner; Kerstin Preuschoff

NeuroImage (Feb 2022)

Brain signals of a Surprise-Actor-Critic model: Evidence for multiple learning modules in human decision making

Vasiliki Liakoni,
Marco P. Lehmann,
Alireza Modirshanechi,
Johanni Brea,
Antoine Lutti,
Wulfram Gerstner,
Kerstin Preuschoff

Affiliations

Vasiliki Liakoni: Corresponding author.; École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland
Marco P. Lehmann: École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland
Alireza Modirshanechi: École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland
Johanni Brea: École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland
Antoine Lutti: Laboratoire de recherche en neuroimagerie (LREN), Department of Clinical Neurosciences, Lausanne University Hospital and University of Lausanne, Lausanne, Switzerland
Wulfram Gerstner: École Polytechnique Fédérale de Lausanne (EPFL), School of Computer and Communication Sciences and School of Life Sciences, Lausanne, Switzerland
Kerstin Preuschoff: Geneva Finance Research Institute & Interfaculty Center for Affective Sciences, University of Geneva, Geneva, Switzerland

Journal volume & issue: Vol. 246
p. 118780

Abstract

Read online

Learning how to reach a reward over long series of actions is a remarkable capability of humans, and potentially guided by multiple parallel learning modules. Current brain imaging of learning modules is limited by (i) simple experimental paradigms, (ii) entanglement of brain signals of different learning modules, and (iii) a limited number of computational models considered as candidates for explaining behavior. Here, we address these three limitations and (i) introduce a complex sequential decision making task with surprising events that allows us to (ii) dissociate correlates of reward prediction errors from those of surprise in functional magnetic resonance imaging (fMRI); and (iii) we test behavior against a large repertoire of model-free, model-based, and hybrid reinforcement learning algorithms, including a novel surprise-modulated actor-critic algorithm. Surprise, derived from an approximate Bayesian approach for learning the world-model, is extracted in our algorithm from a state prediction error. Surprise is then used to modulate the learning rate of a model-free actor, which itself learns via the reward prediction error from model-free value estimation by the critic. We find that action choices are well explained by pure model-free policy gradient, but reaction times and neural data are not. We identify signatures of both model-free and surprise-based learning signals in blood oxygen level dependent (BOLD) responses, supporting the existence of multiple parallel learning modules in the brain. Our results extend previous fMRI findings to a multi-step setting and emphasize the role of policy gradient and surprise signalling in human learning.

Published in NeuroImage

ISSN: 1053-8119 (Print); 1095-9572 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.journals.elsevier.com/neuroimage

About the journal

Abstract

Keywords