Offline Meta-Reinforcement Learning with Contrastive Prediction

HAN Xu, WU Feng

doi:10.3778/j.issn.1673-9418.2203074

Jisuanji kexue yu tansuo (Aug 2023)

Offline Meta-Reinforcement Learning with Contrastive Prediction

HAN Xu, WU Feng

Affiliations

HAN Xu, WU Feng: School of Computer Science and Technology, University of Science and Technology of China, Hefei 230032, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2203074
Journal volume & issue: Vol. 17, no. 8
pp. 1917 – 1927

Abstract

Read online

Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to quickly adapt to a new task by using replay datasets of multiple tasks for offline policy learning. Applying offline meta-reinforcement learning to complex tasks will face two challenges. Firstly, reinforcement learning algorithms overestimate the value of state-action pairs not contained in the dataset and thus select non-optimal actions, resulting in poor performance. Secondly, meta-reinforcement learning algorithms need not only to learn the policy but also to have robust and efficient task inference capabilities. To address the above problems, this paper proposes an offline meta-reinfor-cement learning algorithm based on contrastive prediction. To cope with the problem of overestimation of value functions, the proposed algorithm uses behavior cloning to encourage policy to prefer actions included in the dataset. To improve the task inference capability of meta-learning, the proposed algorithm uses recurrent neural networks for task inference on the contextual trajectories of the agents and uses contrastive learning and prediction networks to analyze and distinguish potential structures in different task trajectories. Experimental results show that the agents trained by the proposed algorithm can score more than 25 percentage points when faced with unseen tasks, and it has higher meta-training efficiency and better generalization performance compared with existing methods.

deep reinforcement learning; offline meta-reinforcement learning; contrastive learning

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords