Unmanned Aerial Vehicle Swarm Cooperative Decision-Making for SEAD Mission: A Hierarchical Multiagent Reinforcement Learning Approach

Longfei Yue; Rennong Yang; Jialiang Zuo; Ying Zhang; Qiuni Li; Yijie Zhang

doi:10.1109/ACCESS.2022.3202938

IEEE Access (Jan 2022)

Unmanned Aerial Vehicle Swarm Cooperative Decision-Making for SEAD Mission: A Hierarchical Multiagent Reinforcement Learning Approach

Longfei Yue,
Rennong Yang,
Jialiang Zuo,
Ying Zhang,
Qiuni Li,
Yijie Zhang

Affiliations

Longfei Yue: Air Traffic Control and Navigation College, Air Force Engineering University, Xi’an, China
Rennong Yang: Air Traffic Control and Navigation College, Air Force Engineering University, Xi’an, China
Jialiang Zuo: Air Traffic Control and Navigation College, Air Force Engineering University, Xi’an, China
Ying Zhang: Air Traffic Control and Navigation College, Air Force Engineering University, Xi’an, China
Qiuni Li: Aeronautics Engineering College, Air Force Engineering University, Xi’an, China
Yijie Zhang: Xi’an Modern Control Technology Research Institute, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2022.3202938
Journal volume & issue: Vol. 10
pp. 92177 – 92191

Abstract

Read online

Unmanned aerial vehicle (UAV) swarm cooperative decision-making has attracted increasing attentions because of its low-cost, reusable, and distributed characteristics. However, existing non-learning-based methods rely on small-scale, known scenarios, and cannot solve complex multi-agent cooperation problem in large-scale, uncertain scenarios. This paper proposes a hierarchical multi-agent reinforcement learning (HMARL) method to solve the heterogeneous UAV swarm cooperative decision-making problem for the typical suppression of enemy air defense (SEAD) mission, which is decoupled into two sub-problems, i.e., the higher-level target allocation (TA) sub-problem and the lower-level cooperative attacking (CA) sub-problem. A HMARL agent model, consisting of a multi-agent deep Q network (MADQN) based TA agent and multiple independent asynchronous proximal policy optimization (IAPPO) based CA agents, is established. MADQN-TA agent can dynamically adjust the TA schemes according to the relative position. To encourage exploration and promote learning efficiency, the Metropolis criterion and inter-agent information exchange techniques are introduced. IAPPO-CA agent adopts independent learning paradigm, which can easily scale with the number of agents. Comparative simulation results validate the effectiveness, robustness, and scalability of the proposed method.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords