IEEE Access (Jan 2024)
Autonomous Air Combat Maneuver Decision-Making Based on PPO-BWDA
Abstract
As Unmanned Combat Aerial Vehicle (UCAV) continue to play an increasingly pivotal role in modern aerial warfare, enhancing their intelligence levels is imperative for global military advancement. Despite notable progress in employing deep reinforcement learning for autonomous air combat maneuver decision-making, existing methods grapple with subpar performance, sluggish training, and susceptibility to local optima. Therefore, this paper proposes a new air combat maneuver decision algorithm based on Proximal Policy Optimization (PPO). Firstly, we establish a UCAV adversarial model and design a dual observation space. Secondly, we develop an Actor-Critic network based on Bidirectional Long Short-Term Memory (BiLSTM) and Multi-Head Self-Attention (MHSA), which better handles high-dimensional information with temporal correlations in air combat situations. Thirdly, we propose an action selection method based on Parallel Monte Carlo Tree Search with Watch the Unobserved (WU-PMCTS) to assist the algorithm in making more effective maneuver decisions. Fourthly, we design a Dynamic Reward Evaluation (DRE) method to dynamically adjust the weights of various rewards according to different adversarial situations, improving algorithm performance. Finally, we introduce an Advantage Prioritized Experience Replay (APER) to sample according to the sample advantage values, enhancing algorithm training efficiency. Experimental results from ablation and comparative experiments demonstrate the superiority of the proposed algorithm over PPO and other mainstream algorithms, with a 0.32 increase in average return and a 36% increase in win rate.
Keywords