Drones (Aug 2024)
Multi-Unmanned Aerial Vehicle Confrontation in Intelligent Air Combat: A Multi-Agent Deep Reinforcement Learning Approach
Abstract
Multiple unmanned aerial vehicle (multi-UAV) confrontation is becoming an increasingly important combat mode in intelligent air combat. The confrontation highly relies on the intelligent collaboration and real-time decision-making of the UAVs. Thus, a decomposed and prioritized experience replay (PER)-based multi-agent deep deterministic policy gradient (DP-MADDPG) algorithm has been proposed in this paper for the moving and attacking decisions of UAVs. Specifically, the confrontation is formulated as a partially observable Markov game. To solve the problem, the DP-MADDPG algorithm is proposed by integrating the decomposed and PER mechanisms into the traditional MADDPG. To overcome the technical challenges of the convergence to a local optimum and a single dominant policy, the decomposed mechanism is applied to modify the MADDPG framework with local and global dual critic networks. Furthermore, to improve the convergence rate of the MADDPG training process, the PER mechanism is utilized to optimize the sampling efficiency from the experience replay buffer. Simulations have been conducted based on the Multi-agent Combat Arena (MaCA) platform, wherein the traditional MADDPG and independent learning DDPG (ILDDPG) algorithms are benchmarks. Simulation results indicate that the proposed DP-MADDPG improves the convergence rate and the convergent reward value. During confrontations against the vanilla distance-prioritized rule-empowered and intelligent ILDDPG-empowered blue parties, the DP-MADDPG-empowered red party can improve the win rate to 96% and 80.5%, respectively.
Keywords