Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies

Zeyu Chen; Haiying Liu; Guohua Liu

doi:10.3390/drones8110619

Drones (Oct 2024)

Real-Time Policy Optimization for UAV Swarms Based on Evolution Strategies

Zeyu Chen,
Haiying Liu,
Guohua Liu

Affiliations

Zeyu Chen: School of Mathematics, Southeast University, Nanjing 211102, China
Haiying Liu: College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Guohua Liu: School of Mathematics, Southeast University, Nanjing 211102, China

DOI: https://doi.org/10.3390/drones8110619
Journal volume & issue: Vol. 8, no. 11
p. 619

Abstract

Read online

Multi-agent decision-making faces many challenges such as non-stationarity and sparse rewards, while the complexity and randomness of the real environment further complicate policy development. This paper addresses the high-dimensional policy optimization problems of unmanned aerial vehicle (UAV) swarms. By modeling the problem scenario as a Markov decision process, a real-time policy optimization algorithm based on evolution strategy (ES) pre-training is proposed. This approach combines decision-time planning with background planning to evaluate and integrate different sets of policy parameters in a temporal context. In the experimental phase, the policy network is trained using both ES and REINFORCE algorithms on a constructed simulation platform. Comparative experiments demonstrate the effectiveness of using ES for policy pre-training. Finally, the proposed real-time policy optimization algorithm further improves the performance of the swarm by approximately 10% in simulations, offering a feasible solution for adversarial games between swarms and extending the research scope of evolutionary algorithms.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords