IEEE Access (Jan 2024)
Research on the Reward Design Method for Deep Reinforcement Learning in WVR Air Combat
Abstract
Deep reinforcement learning (DRL) can significantly improve the autonomy and effectiveness of air combat maneuver decision (ACMD). The design of reward functions faces significant challenges due to the long duration and large state space of air combat. Most existing methods rely on subjective judgment and cumbersome parameter tuning. Therefore, this paper proposed a reward design paradigm that provides practical guidance for the agent to win within visual range (WVR) air combat. Firstly, we proposed a dynamic situation assessor(DSA) based on the improved entropy weight method, which can improve the accuracy, contrast, and objectivity of air combat situation evaluation. In addition, we design a parameter optimizer(PO) based on the improved sparrow search algorithm (SSA) to achieve the automation of parameter tuning for proximal policy optimization (PPO), and applied it in the reward optimization, which improves the training efficiency significantly. Finally, the effectiveness of our work is evaluated through comparative experiments with state-of-the-art (SOTA) methods on a high-fidelity air combat simulation platform (JSBSim). The training process and win rate tests confirm the effectiveness of our reward design paradigm and show good training efficiency compared to existing methods.
Keywords