Multi-UAV Cooperative Pursuit of a Fast-Moving Target UAV Based on the GM-TD3 Algorithm

Yaozhong Zhang; Meiyan Ding; Yao Yuan; Jiandong Zhang; Qiming Yang; Guoqing Shi; Frank Jiang; Meiqu Lu

doi:10.3390/drones8100557

Drones (Oct 2024)

Multi-UAV Cooperative Pursuit of a Fast-Moving Target UAV Based on the GM-TD3 Algorithm

Yaozhong Zhang,
Meiyan Ding,
Yao Yuan,
Jiandong Zhang,
Qiming Yang,
Guoqing Shi,
Frank Jiang,
Meiqu Lu

Affiliations

Yaozhong Zhang: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Meiyan Ding: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Yao Yuan: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Jiandong Zhang: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Qiming Yang: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Guoqing Shi: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Frank Jiang: Faculty of Science Engineering and Built Environment, Deakin University, Melbourne 3125, Australia
Meiqu Lu: School of Artificial Intelligence, Guangxi Minzu University, Nanning 530006, China

DOI: https://doi.org/10.3390/drones8100557
Journal volume & issue: Vol. 8, no. 10
p. 557

Abstract

Read online

Recently, developing multi-UAVs to cooperatively pursue a fast-moving target has become a research hotspot in the current world. Although deep reinforcement learning (DRL) has made a lot of achievements in the UAV pursuit game, there are still some problems such as high-dimensional parameter space, the ease of falling into local optimization, the long training time, and the low task success rate. To solve the above-mentioned issues, we propose an improved twin delayed deep deterministic policy gradient algorithm combining the genetic algorithm and maximum mean discrepancy method (GM-TD3) for multi-UAV cooperative pursuit of high-speed targets. Firstly, this paper combines GA-based evolutionary strategies with TD3 to generate action networks. Then, in order to avoid local optimization in the algorithm training process, the maximum mean difference (MMD) method is used to increase the diversity of the policy population in the updating process of the population parameters. Finally, by setting the sensitivity weights of the genetic memory buffer of UAV individuals, the mutation operator is improved to enhance the stability of the algorithm. In addition, this paper designs a hybrid reward function to accelerate the convergence speed of training. Through simulation experiments, we have verified that the training efficiency of the improved algorithm has been greatly improved, which can achieve faster convergence; the successful rate of the task has reached 95%, and further validated UAVs can better cooperate to complete the pursuit game task.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords