Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics

Honghu Xue; Benedikt Hein; Mohamed Bakr; Georg Schildbach; Bengt Abel; Elmar Rueckert

doi:10.3390/app12063153

Applied Sciences (Mar 2022)

Using Deep Reinforcement Learning with Automatic Curriculum Learning for Mapless Navigation in Intralogistics

Honghu Xue,
Benedikt Hein,
Mohamed Bakr,
Georg Schildbach,
Bengt Abel,
Elmar Rueckert

Affiliations

Honghu Xue: Institute for Robotics and Cognitive Systems, University of Luebeck, 23562 Luebeck, Germany
Benedikt Hein: Institute for Robotics and Cognitive Systems, University of Luebeck, 23562 Luebeck, Germany
Mohamed Bakr: KION Group AG, Technology and Innovation, 22113 Hamburg, Germany
Georg Schildbach: Institute for Electrical Engineering in Medicine, University of Luebeck, 23562 Luebeck, Germany
Bengt Abel: KION Group AG, Technology and Innovation, 22113 Hamburg, Germany
Elmar Rueckert: Institute for Cyber Physical Systems, Montanuniversität Leoben, 8700 Leoben, Austria

DOI: https://doi.org/10.3390/app12063153
Journal volume & issue: Vol. 12, no. 6
p. 3153

Abstract

Read online

We propose a deep reinforcement learning approach for solving a mapless navigation problem in warehouse scenarios. In our approach, an automatic guided vehicle is equipped with two LiDAR sensors and one frontal RGB camera and learns to perform a targeted navigation task. The challenges reside in the sparseness of positive samples for learning, multi-modal sensor perception with partial observability, the demand for accurate steering maneuvers together with long training cycles. To address these points, we propose NavACL-Q as an automatic curriculum learning method in combination with a distributed version of the soft actor-critic algorithm. The performance of the learning algorithm is evaluated exhaustively in a different warehouse environment to validate both robustness and generalizability of the learned policy. Results in NVIDIA Isaac Sim demonstrates that our trained agent significantly outperforms the map-based navigation pipeline provided by NVIDIA Isaac Sim with an increased agent-goal distance of 3 m and a wider initial relative agent-goal rotation of approximately 45∘. The ablation studies also suggest that NavACL-Q greatly facilitates the whole learning process with a performance gain of roughly 40% compared to training with random starts and a pre-trained feature extractor manifestly boosts the performance by approximately 60%.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords