UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method

Dexing Wei; Lun Zhang; Quan Liu; Hao Chen; Jian Huang

doi:10.3390/drones8060214

Drones (May 2024)

UAV Swarm Cooperative Dynamic Target Search: A MAPPO-Based Discrete Optimal Control Method

Dexing Wei,
Lun Zhang,
Quan Liu,
Hao Chen,
Jian Huang

Affiliations

Dexing Wei: College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Lun Zhang: College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Quan Liu: College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Hao Chen: College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
Jian Huang: College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

DOI: https://doi.org/10.3390/drones8060214
Journal volume & issue: Vol. 8, no. 6
p. 214

Abstract

Read online

Unmanned aerial vehicles (UAVs) are commonly employed in pursuit and rescue missions, where the target’s trajectory is unknown. Traditional methods, such as evolutionary algorithms and ant colony optimization, can generate a search route in a given scenario. However, when the scene changes, the solution needs to be recalculated. In contrast, more advanced deep reinforcement learning methods can train an agent that can be directly applied to a similar task without recalculation. Nevertheless, there are several challenges when the agent learns how to search for unknown dynamic targets. In this search task, the rewards are random and sparse, which makes learning difficult. In addition, because of the need for the agent to adapt to various scenario settings, interactions required between the agent and the environment are more comparable to typical reinforcement learning tasks. These challenges increase the difficulty of training agents. To address these issues, we propose the OC-MAPPO method, which combines optimal control (OC) and Multi-Agent Proximal Policy Optimization (MAPPO) with GPU parallelization. The optimal control model provides the agent with continuous and stable rewards. Through parallelized models, the agent can interact with the environment and collect data more rapidly. Experimental results demonstrate that the proposed method can help the agent learn faster, and the algorithm demonstrated a 26.97% increase in the success rate compared to genetic algorithms.

Published in Drones

ISSN: 2504-446X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Motor vehicles. Aeronautics. Astronautics
Website: http://www.mdpi.com/journal/drones

About the journal

Abstract

Keywords