Confrontation and Obstacle-Avoidance of Unmanned Vehicles Based on Progressive Reinforcement Learning

Chengdong Ma; Jianan Liu; Saichao He; Wenjing Hong; Jia Shi

doi:10.1109/ACCESS.2023.3278597

IEEE Access (Jan 2023)

Confrontation and Obstacle-Avoidance of Unmanned Vehicles Based on Progressive Reinforcement Learning

Chengdong Ma,
Jianan Liu,
Saichao He,
Wenjing Hong,
Jia Shi

Affiliations

Chengdong Ma: ORCiD; Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
Jianan Liu: ORCiD; Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
Saichao He: Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
Wenjing Hong: ORCiD; Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China
Jia Shi: ORCiD; Department of Chemical and Biochemical Engineering, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, China

DOI: https://doi.org/10.1109/ACCESS.2023.3278597
Journal volume & issue: Vol. 11
pp. 50398 – 50411

Abstract

Read online

The core technique of unmanned vehicle systems is the autonomous maneuvering decision, which not only determines the applications of unmanned vehicles but also is the critical technique many countries are competing to develop. Reinforcement Learning (RL) is the potential design method for autonomous maneuvering decision-making systems. Nevertheless, in the face of complex decision-making tasks, it is still challenging to master the optimal policy due to the low learning efficiency caused by the complex environment, high dimensional state, and sparse reward. Inspired by the human learning process from simple to complex, we propose a novel progressive deep RL algorithm for policy optimization in unmanned autonomous decision-making systems in this paper. The proposed algorithm divides the training of the autonomous maneuvering decision into a sequence of curricula with learning tasks from simple to complex. Finally, through the self-play stage, the iterative optimization of the policy is realized. Furthermore, the confrontation environment with two unmanned vehicles with obstacles is analyzed and modeled. Finally, the simulation leads to the one-to-one adversarial tasks demonstrate the effectiveness and applicability of the proposed design algorithm.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords