The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm

Xianzhong Gao; Yue Zhang; Baolai Wang; Zhihui Leng; Zhongxi Hou

doi:10.3390/drones8090501

Drones (Sep 2024)

The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm

Xianzhong Gao,
Yue Zhang,
Baolai Wang,
Zhihui Leng,
Zhongxi Hou

Affiliations

Xianzhong Gao: Test Center, National University of Defense Technology, Xi’an 710106, China
Yue Zhang: College of Aerospace Science and Engineering, National University of Defense Technology, Changsha 410073, China
Baolai Wang: College of Computer, National University of Defense Technology, Changsha 410073, China
Zhihui Leng: Jiangxi Hongdu Aviation Industry Group Co., Ltd., Nanchang 330096, China
Zhongxi Hou: Test Center, National University of Defense Technology, Xi’an 710106, China

DOI: https://doi.org/10.3390/drones8090501
Journal volume & issue: Vol. 8, no. 9
p. 501

Abstract

Read online

Nowadays, unmanned aerial vehicles (UAVs) pose a significant challenge to air defense systems. Unmanned combat aerial vehicles (UCAVs) have been proven to be an effective method to counter the threat of UAVs in application. Therefore, maneuver decision-making has become the crucial technology to achieve autonomous air combat for UCAVs. In order to solve the problem of maneuver decision-making, an autonomous model of UCAVs based on the deep reinforcement learning method was proposed in this paper. Firstly, the six-degree-of-freedom (DoF) dynamic model was built in three-dimensional space, and the continuous actions of tangential overload, normal overload, and roll angle were selected as the maneuver inputs. Secondly, to improve the convergence speed for the deep reinforcement learning method, the idea of “scenario-transfer training” was introduced into the twin delayed deep deterministic (TD3) policy gradient algorithm, the results showing that the improved algorithm could cut off about 60% of the training time. Thirdly, for the “nose-to-nose turns”, which is one of the classical maneuvers for experienced pilots, the optimal maneuver generated by the proposed method was analyzed. The results showed that the maneuver strategy obtained by the proposed method was highly consistent with that made by experienced fighter pilots. This is also the first time in a public article that compared the maneuver decisions made by the deep reinforcement learning method with experienced fighter pilots. This research can provide some meaningful references to generate autonomous decision-making strategies for UCAVs.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords