Autonomous Maneuver Decision-Making Through Curriculum Learning and Reinforcement Learning With Sparse Rewards

Yujie Wei; Hongpeng Zhang; Yuan Wang; Changqiang Huang

doi:10.1109/ACCESS.2023.3297095

IEEE Access (Jan 2023)

Autonomous Maneuver Decision-Making Through Curriculum Learning and Reinforcement Learning With Sparse Rewards

Yujie Wei,
Hongpeng Zhang,
Yuan Wang,
Changqiang Huang

Affiliations

Yujie Wei: Institute of Aeronautics Engineering, Air Force Engineering University, Xi’an, China
Hongpeng Zhang: ORCiD; Institute of Aeronautics Engineering, Air Force Engineering University, Xi’an, China
Yuan Wang: ORCiD; Institute of Aeronautics Engineering, Air Force Engineering University, Xi’an, China
Changqiang Huang: ORCiD; Institute of Aeronautics Engineering, Air Force Engineering University, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2023.3297095
Journal volume & issue: Vol. 11
pp. 73543 – 73555

Abstract

Read online

Reinforcement learning is an effective approach for solving decision-making problems. However, when using reinforcement learning to solve maneuver decision-making with sparse rewards, it costs too much time for training, and the final performance may not be satisfactory. In order to overcome the shortcomings, the method for maneuver decision-making based on curriculum learning and reinforcement learning is proposed. First, three curricula are designed to address the maneuver decision-making problem: angle curriculum, distance curriculum and hybrid curriculum. They are proposed according to the intuition that closer destinations are easier to arrive at. Then, they are used to train agents and compared with the original method without any curriculum. The training results show that angle curriculum can increase the speed and stability of training, and improve the performance of maneuver decision-making; distance curriculum can increase the speed and stability of agent training; hybrid curriculum is not better than the other curricula, because it makes the agent get stuck at the local optimum. The simulation results show that after training, the agent can handle the situations where targets come from different directions, and the maneuver decision-makings are rational, effective, and interpretable, whereas the method without curriculum is invalid.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords