Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Jialong Qian; Qingsong Zhou; Zhihui Li; Zhongping Yang; Shasha Shi; Zhenjia Xu; Qiyun Xu

doi:10.3390/rs16173158

Remote Sensing (Aug 2024)

Efficient Jamming Policy Generation Method Based on Multi-Timescale Ensemble Q-Learning

Jialong Qian,
Qingsong Zhou,
Zhihui Li,
Zhongping Yang,
Shasha Shi,
Zhenjia Xu,
Qiyun Xu

Affiliations

Jialong Qian: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Qingsong Zhou: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Zhihui Li: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Zhongping Yang: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Shasha Shi: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Zhenjia Xu: College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Qiyun Xu: Unit 93216 of PLA, Beijing 100085, China

DOI: https://doi.org/10.3390/rs16173158
Journal volume & issue: Vol. 16, no. 17
p. 3158

Abstract

Read online

With the advancement of radar technology toward multifunctionality and cognitive capabilities, traditional radar countermeasures are no longer sufficient to meet the demands of countering the advanced multifunctional radar (MFR) systems. Rapid and accurate generation of the optimal jamming strategy is one of the key technologies for efficiently completing radar countermeasures. To enhance the efficiency and accuracy of jamming policy generation, an efficient jamming policy generation method based on multi-timescale ensemble Q-learning (MTEQL) is proposed in this paper. First, the task of generating jamming strategies is framed as a Markov decision process (MDP) by constructing a countermeasure scenario between the jammer and radar, while analyzing the principle radar operation mode transitions. Then, multiple structure-dependent Markov environments are created based on the real-world adversarial interactions between jammers and radars. Q-learning algorithms are executed concurrently in these environments, and their results are merged through an adaptive weighting mechanism that utilizes the Jensen–Shannon divergence (JSD). Ultimately, a low-complexity and near-optimal jamming policy is derived. Simulation results indicate that the proposed method has superior jamming policy generation performance compared with the Q-learning algorithm, in terms of the short jamming decision-making time and low average strategy error rate.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords