Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation

Guofei Xiang; Songyi Dian; Shaofeng Du; Zhonghui Lv

doi:10.3390/s23020762

Sensors (Jan 2023)

Variational Information Bottleneck Regularized Deep Reinforcement Learning for Efficient Robotic Skill Adaptation

Guofei Xiang,
Songyi Dian,
Shaofeng Du,
Zhonghui Lv

Affiliations

Guofei Xiang: College of Electrical Engineering, Sichuan University, Chengdu 610065, China
Songyi Dian: College of Electrical Engineering, Sichuan University, Chengdu 610065, China
Shaofeng Du: National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Baotou 014031, China
Zhonghui Lv: National Key Laboratory of Special Vehicle Design and Manufacturing Integration Technology, Baotou 014031, China

DOI: https://doi.org/10.3390/s23020762
Journal volume & issue: Vol. 23, no. 2
p. 762

Abstract

Read online

Deep Reinforcement Learning (DRL) algorithms have been widely studied for sequential decision-making problems, and substantial progress has been achieved, especially in autonomous robotic skill learning. However, it is always difficult to deploy DRL methods in practical safety-critical robot systems, since the training and deployment environment gap always exists, and this issue would become increasingly crucial due to the ever-changing environment. Aiming at efficiently robotic skill transferring in a dynamic environment, we present a meta-reinforcement learning algorithm based on a variational information bottleneck. More specifically, during the meta-training stage, the variational information bottleneck first has been applied to infer the complete basic tasks for the whole task space, then the maximum entropy regularized reinforcement learning framework has been used to learn the basic skills consistent with that of basic tasks. Once the training stage is completed, all of the tasks in the task space can be obtained by a nonlinear combination of the basic tasks, thus, the according skills to accomplish the tasks can also be obtained by some way of a combination of the basic skills. Empirical results on several highly nonlinear, high-dimensional robotic locomotion tasks show that the proposed variational information bottleneck regularized deep reinforcement learning algorithm can improve sample efficiency by 200–5000 times on new tasks. Furthermore, the proposed algorithm achieves substantial asymptotic performance improvement. The results indicate that the proposed meta-reinforcement learning framework makes a significant step forward to deploy the DRL-based algorithm to practical robot systems.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords