Zhihui kongzhi yu fangzhen (Oct 2024)
Improved MATD3 algorithm and its adversarial application
Abstract
Improving the training effect of multi-agent has always been the focus in the field of reinforcement learning. Based on the multi-Agent twin-delay deep deterministic policy gradient (MATD3) algorithm, a parameter sharing mechanism is introduced to improve training efficiency. At the same time, in order to alleviate the inconsistency between real rewards and auxiliary rewards, drawing on the ideas of course learning, a decay factor for auxiliary rewards is proposed to ensure the motivation of policy exploration in the early training period and the reward consistency in the late training period. And the proposed improved MATD3 algorithm is applied to combat vehicle games to achieve intelligent decision-making of the vehicle. The application results show that the reward curve of the vehicle converges stably and the effect is good. Besides, the improved algorithm is compared with the original MATD3 algorithm, and the simulation results verify that the improved algorithm can effectively improve the effect of convergence and the convergence value of reward.
Keywords