Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation

Yuedong WANG; Yijing GU; Yan LIANG; Zengfu WANG; Huixia ZHANG

doi:10.12000/JR23023

Leida xuebao (Jun 2023)

Deep Game of Escorting Suppressive Jamming and Networked Radar Power Allocation

Yuedong WANG,
Yijing GU,
Yan LIANG,
Zengfu WANG,
Huixia ZHANG

Affiliations

Yuedong WANG: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Yijing GU: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Yan LIANG: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Zengfu WANG: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Huixia ZHANG: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

DOI: https://doi.org/10.12000/JR23023
Journal volume & issue: Vol. 12, no. 3
pp. 642 – 656

Abstract

Read online

The traditional networked radar power allocation is typically optimized with a given jamming model, while the jammer resource allocation is optimized with a given radar power allocation method; such research lack gaming and interaction. Given the rising seriousness of combat scenarios in which radars and jammers compete, this study suggests a deep game problem of networked radar power allocation under escort suppression jamming, in which intelligent target jamming is trained using Deep Reinforcement Learning (DRL). First, the jammer and the networked radar are mapped as two agents in this problem. Based on the jamming model and the radar detection model, the target detection model of the networked radar under suppressed jamming and the optimized objective function for maximizing the target detection probability are established. In terms of the networked radar agent, the radar power allocation vector is generated by the Proximal Policy Optimization (PPO) policy network. In terms of the jammer agent, a hybrid policy network is designed to simultaneously create beam selection and power allocation actions. Domain knowledge is introduced to construct more effective reward functions. Three kinds of domain knowledge, namely target detection model, equal power allocation strategy, and greedy interference power allocation strategy, are employed to produce guided rewards for the networked radar agent and the jammer agent, respectively. Consequently, the learning efficiency and performance of the agent are improved. Lastly, alternating training is used to learn the policy network parameters of both agents. The experimental results show that when the jammer adopts the DRL-based resource allocation strategy, the DRL-based networked radar power allocation is significantly better than the particle swarm-based and the artificial fish swarm-based networked radar power allocation in both target detection probability and run time metrics.

Published in Leida xuebao

ISSN: 2095-283X (Print); 2097-339X (Online)
Publisher: China Science Publishing & Media Ltd. (CSPM)
Country of publisher: China
LCC subjects: Science: Physics: Electricity and magnetism
Website: https://radars.ac.cn/

About the journal

Abstract

Keywords