Defence Technology (Jan 2024)

Recorded recurrent deep reinforcement learning guidance laws for intercepting endoatmospheric maneuvering missiles

  • Xiaoqi Qiu,
  • Peng Lai,
  • Changsheng Gao,
  • Wuxing Jing

Journal volume & issue
Vol. 31
pp. 457 – 470

Abstract

Read online

This work proposes a recorded recurrent twin delayed deep deterministic (RRTD3) policy gradient algorithm to solve the challenge of constructing guidance laws for intercepting endoatmospheric maneuvering missiles with uncertainties and observation noise. The attack-defense engagement scenario is modeled as a partially observable Markov decision process (POMDP). Given the benefits of recurrent neural networks (RNNs) in processing sequence information, an RNN layer is incorporated into the agent’s policy network to alleviate the bottleneck of traditional deep reinforcement learning methods while dealing with POMDPs. The measurements from the interceptor’s seeker during each guidance cycle are combined into one sequence as the input to the policy network since the detection frequency of an interceptor is usually higher than its guidance frequency. During training, the hidden states of the RNN layer in the policy network are recorded to overcome the partially observable problem that this RNN layer causes inside the agent. The training curves show that the proposed RRTD3 successfully enhances data efficiency, training speed, and training stability. The test results confirm the advantages of the RRTD3-based guidance laws over some conventional guidance laws.

Keywords