Automatic Curriculum Design for Object Transportation Based on Deep Reinforcement Learning

Gyuho Eoh; Tae-Hyoung Park

doi:10.1109/ACCESS.2021.3118109

IEEE Access (Jan 2021)

Automatic Curriculum Design for Object Transportation Based on Deep Reinforcement Learning

Gyuho Eoh,
Tae-Hyoung Park

Affiliations

Gyuho Eoh: ORCiD; Industrial AI Research Center, Chungbuk National University, Cheongju, South Korea
Tae-Hyoung Park: ORCiD; Industrial AI Research Center, Chungbuk National University, Cheongju, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3118109
Journal volume & issue: Vol. 9
pp. 137281 – 137294

Abstract

Read online

This paper presents an automatic curriculum learning (ACL) method for object transportation based on deep reinforcement learning (DRL). Previous studies on object transportation using DRL have a sparse reward problem that an agent receives a rare reward for only the transportation completion of an object. Generally, curriculum learning (CL) has been used to solve the sparse reward problem. However, the conventional CL methods should be manually designed by users, which is difficult and tedious work. Moreover, there were no standard CL methods for object transportation. Therefore, we propose an ACL method for object transportation in which human intervention is unnecessary at the training step. A robot automatically designs curricula itself and iteratively trains according to the curricula. First, we define the difficult level of object transportation using a map, which is determined by the predicted travelling distance of an object and the existence of obstacles and walls. In the beginning, a robot learns the object transportation at an easy level (i.e., travelling distance is short and there are less obstacles around), then learns a difficult task (i.e., the long travelling distance of an object is required and there are many obstacles around). Second, training time also affects the performance of object transportation, and thus, we suggest an adaptive determining method of the number of training episodes. The number of episodes for training is adaptively determined based on the current success rate of object transportation. We verified the proposed method in simulation environments, and the success rate of the proposed method was 14% higher than no-curriculum. Also, the proposed method showed 63% (maximum) and 14% (minimum) higher success rates compared with the manual curriculum methods. Additionally, we conducted real experiments to verify the gap between simulation and practical results.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords