Learning Reward Function with Matching Network for Mapless Navigation

Qichen Zhang; Meiqiang Zhu; Liang Zou; Ming Li; Yong Zhang

doi:10.3390/s20133664

Sensors (Jun 2020)

Learning Reward Function with Matching Network for Mapless Navigation

Qichen Zhang,
Meiqiang Zhu,
Liang Zou,
Ming Li,
Yong Zhang

Affiliations

Qichen Zhang: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China
Meiqiang Zhu: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China
Liang Zou: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China
Ming Li: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China
Yong Zhang: Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou 221116 China

DOI: https://doi.org/10.3390/s20133664
Journal volume & issue: Vol. 20, no. 13
p. 3664

Abstract

Read online

Deep reinforcement learning (DRL) has been successfully applied in mapless navigation. An important issue in DRL is to design a reward function for evaluating actions of agents. However, designing a robust and suitable reward function greatly depends on the designer’s experience and intuition. To address this concern, we consider employing reward shaping from trajectories on similar navigation tasks without human supervision, and propose a general reward function based on matching network (MN). The MN-based reward function is able to gain the experience by pre-training through trajectories on different navigation tasks and accelerate the training speed of DRL in new tasks. The proposed reward function keeps the optimal strategy of DRL unchanged. The simulation results on two static maps show that the DRL converge with less iterations via the learned reward function than the state-of-the-art mapless navigation methods. The proposed method performs well in dynamic maps with partially moving obstacles. Even when test maps are different from training maps, the proposed strategy is able to complete the navigation tasks without additional training.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords