Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles

Xiaoqi Qiu; Peng Lai; Changsheng Gao; Wuxing Jing

Defence Technology (Jan 2024)

Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles

Xiaoqi Qiu,
Peng Lai,
Changsheng Gao,
Wuxing Jing

Affiliations

Xiaoqi Qiu: Department of Aerospace Engineering, Harbin Institute of Technology, Harbin, 150001, China
Peng Lai: Shanghai Electro-Mechanical Engineering Institute, Shanghai Academy of Spaceflight Technology, Shanghai, 201100, China
Changsheng Gao: Department of Aerospace Engineering, Harbin Institute of Technology, Harbin, 150001, China; Corresponding author.
Wuxing Jing: Department of Aerospace Engineering, Harbin Institute of Technology, Harbin, 150001, China

Journal volume & issue: Vol. 31
pp. 457 – 470

Abstract

Read online

This work proposes a recorded recurrent twin delayed deep deterministic (RRTD3) policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise. The attack-defense engagement scenario is modeled as a partially observable Markov decision process (POMDP). Given the benefits of recurrent neural networks (RNNs) in processing sequence information, an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs. The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency. During training, the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent. The training curves show that the proposed RRTD3 successfully enhances data efficiency, training speed, and training stability. The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.

Published in Defence Technology

ISSN: 2214-9147 (Online)
Publisher: KeAi Communications Co., Ltd.
Country of publisher: China
LCC subjects: Military Science
Website: https://www.keaipublishing.com/en/journals/defence-technology/

About the journal

Abstract

Keywords