Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning

Bo Li; Jingyi Huang; Shuangxia Bai; Zhigang Gan; Shiyang Liang; Neretin Evgeny; Shouwen Yao

doi:10.1049/cit2.12109

CAAI Transactions on Intelligence Technology (Mar 2023)

Autonomous air combat decision‐making of UAV based on parallel self‐play reinforcement learning

Bo Li,
Jingyi Huang,
Shuangxia Bai,
Zhigang Gan,
Shiyang Liang,
Neretin Evgeny,
Shouwen Yao

Affiliations

Bo Li: School of Electronics Information Northwestern Polytechnical University Xi'an China
Jingyi Huang: School of Electronics Information Northwestern Polytechnical University Xi'an China
Shuangxia Bai: School of Electronics Information Northwestern Polytechnical University Xi'an China
Zhigang Gan: School of Electronics Information Northwestern Polytechnical University Xi'an China
Shiyang Liang: Avic Luoyang Electro‐optical Equipment Research Institute Luoyang China
Neretin Evgeny: Moscow Aviation Institute 4 Volokolamskoe Highway Moscow Russia
Shouwen Yao: School of Mechanical Engineering Beijing Institute of Technology Beijing China

DOI: https://doi.org/10.1049/cit2.12109
Journal volume & issue: Vol. 8, no. 1
pp. 64 – 81

Abstract

Read online

Abstract Aiming at addressing the problem of manoeuvring decision‐making in UAV air combat, this study establishes a one‐to‐one air combat model, defines missile attack areas, and uses the non‐deterministic policy Soft‐Actor‐Critic (SAC) algorithm in deep reinforcement learning to construct a decision model to realize the manoeuvring process. At the same time, the complexity of the proposed algorithm is calculated, and the stability of the closed‐loop system of air combat decision‐making controlled by neural network is analysed by the Lyapunov function. This study defines the UAV air combat process as a gaming process and proposes a Parallel Self‐Play training SAC algorithm (PSP‐SAC) to improve the generalisation performance of UAV control decisions. Simulation results have shown that the proposed algorithm can realize sample sharing and policy sharing in multiple combat environments and can significantly improve the generalisation ability of the model compared to independent training.

Published in CAAI Transactions on Intelligence Technology

ISSN: 2468-2322 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/24682322

About the journal

Abstract

Keywords