Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Chandra Kusuma Dewa; Jun Miura

doi:10.1109/ACCESS.2021.3082136

IEEE Access (Jan 2021)

Integrating Multiple Policies for Person-Following Robot Training Using Deep Reinforcement Learning

Chandra Kusuma Dewa,
Jun Miura

Affiliations

Chandra Kusuma Dewa: ORCiD; Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan
Jun Miura: ORCiD; Department of Computer Science and Engineering, Toyohashi University of Technology, Toyohashi, Japan

DOI: https://doi.org/10.1109/ACCESS.2021.3082136
Journal volume & issue: Vol. 9
pp. 75526 – 75541

Abstract

Read online

Given a training environment which follows Markov decision process for a specific task, a deep reinforcement learning (DRL) agent is able to find possible optimal policies which map states of the environment to appropriate actions by repeatedly trying various actions to maximize training rewards. However, the learned policies cannot be reused directly in the training process for other new tasks resulting wasted precious time and resources. To solve this problem, we propose a DRL-based method for training an agent capable of selecting the appropriate policy for current state of the environment from a set of previously trained optimal policies for a given task which can be decomposed into other sub tasks. We implement our proposed method to a person-following robot task training that can be broken down into three sub tasks, namely: navigation, left attending, and right attending. Using the proposed method, the previously learned optimal navigation policy obtained from our previous work is integrated to attending policies which are trained in this study. We also introduce the use of weight-scheduled action smoothing which is able to stabilize actions generated by the agent in the attending task training. Our experiment results show that the proposed method is able to integrate all sub policies using the action smoothing method even though the navigation and the attending policies have dissimilar input structures, unalike output ranges, and are trained in different ways. Moreover, our proposed method shows better results compared to training from scratch and training using transfer learning strategy.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords