A Sequential Decision Algorithm of Reinforcement Learning for Composite Action Space

Yuan Gao; Ye Wang; Lei Zhang; Lihong Guo; Jiang Li; Shouhong Sun

doi:10.1109/ACCESS.2023.3320137

IEEE Access (Jan 2023)

A Sequential Decision Algorithm of Reinforcement Learning for Composite Action Space

Yuan Gao,
Ye Wang,
Lei Zhang,
Lihong Guo,
Jiang Li,
Shouhong Sun

Affiliations

Yuan Gao: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Ye Wang: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Lei Zhang: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Lihong Guo: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Jiang Li: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Shouhong Sun: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China

DOI: https://doi.org/10.1109/ACCESS.2023.3320137
Journal volume & issue: Vol. 11
pp. 107669 – 107684

Abstract

Read online

It is the key research object of electronic warfare to use UAV (Unmanned Aerial Vehicle) clusters to carry out electronic countermeasure tasks. The UAV carries loads such as reconnaissance and interference at the same time, which makes it necessary to simultaneously decide multiple types of actions—namely, compound actions—which poses a challenge to intelligent decision-making algorithms. Considering the problem of action-space dimensional complexity and weak collaboration between decisions in multi-agent scenarios with composite actions, this study proposed a decision algorithm involving a multi-agent reinforcement-learning sequence, which combined joint composite actions into sequential decision, reducing the difficulty of a single decision and enhancing the collaboration between various agents and their individual decisions. Because long decision sequences required better depth modeling and had high variance, a DeLighT module was added to the naïve transformer model to increase the depth and baseline techniques, which were used to reduce the variance in the value estimation. The simulated results verified the effectiveness of the proposed algorithm in the UAV cooperative combat scenario, where each agent had a composite action space and showed better performance than the existing algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords