Automatic Curriculum Determination for Deep Reinforcement Learning in Reconfigurable Robots

Zohar Karni; Or Simhon; David Zarrouk; Sigal Berman

doi:10.1109/ACCESS.2024.3406768

IEEE Access (Jan 2024)

Automatic Curriculum Determination for Deep Reinforcement Learning in Reconfigurable Robots

Zohar Karni,
Or Simhon,
David Zarrouk,
Sigal Berman

Affiliations

Zohar Karni: ORCiD; Department of Industrial Engineering and Management, Ben Gurion University of the Negev, Beer Sheva, Israel
Or Simhon: Department of Mechanical Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel
David Zarrouk: ORCiD; Department of Mechanical Engineering, Ben Gurion University of the Negev, Beer Sheva, Israel
Sigal Berman: ORCiD; Department of Industrial Engineering and Management, Ben Gurion University of the Negev, Beer Sheva, Israel

DOI: https://doi.org/10.1109/ACCESS.2024.3406768
Journal volume & issue: Vol. 12
pp. 78342 – 78353

Abstract

Read online

Deep reinforcement learning (DRL) is a prevalent learning method in robotics. DRL is commonly applied in real-world scenarios, such as learning motion behavior in rough terrain. However, the lengthy learning epochs reduce DRL practicability in many such environments. Curriculum learning can significantly enhance the efficiency of DRL, but establishing a curriculum is challenging, partly because it can be difficult to assess the operation complexity for each task. Determining operation complexity can be especially difficult for reconfigurable search and rescue robots. We present a method for learning based on an automatically established curriculum tuned to the robot’s perspective. The method is especially suitable for outdoor environments with multiple obstacle variants, e.g., environments encountered in search and rescue missions. After an initial learning stage, the behavior of a robot when overcoming each obstacle variant is characterized using Gaussian mixture models (GMMs). Hellinger’s distance between the GMMs is computed and used to cluster the variants hierarchically. The curriculum is determined based on the formed clusters and the average success rate in each cluster. The method was implemented on RSTAR, a highly maneuverable and reconfigurable field robot that can overcome a variety of obstacles. Learning using the automatically determined curriculum was compared to learning without a curriculum in a simulation with three obstacle types: a narrow channel, a low entrance, and a step. The results show that learning using the automatically determined curriculum enables overcoming obstacles faster and with higher success rates than learning without a curriculum for all obstacles, especially for complex obstacle variants. The developed method offers a promising method for learning motion behavior in real-world scenarios.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords