Frequency Agile Anti-Interference Technology Based on Reinforcement Learning Using Long Short-Term Memory and Multi-Layer Historical Information Observation

Weihao Shi; Shanhong Guo; Xiaoyu Cong; Weixing Sheng; Jing Yan; Jinkun Chen

doi:10.3390/rs15235467

Remote Sensing (Nov 2023)

Frequency Agile Anti-Interference Technology Based on Reinforcement Learning Using Long Short-Term Memory and Multi-Layer Historical Information Observation

Weihao Shi,
Shanhong Guo,
Xiaoyu Cong,
Weixing Sheng,
Jing Yan,
Jinkun Chen

Affiliations

Weihao Shi: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Shanhong Guo: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Xiaoyu Cong: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Weixing Sheng: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Jing Yan: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Jinkun Chen: School of Electronic and Optical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

DOI: https://doi.org/10.3390/rs15235467
Journal volume & issue: Vol. 15, no. 23
p. 5467

Abstract

Read online

In modern electronic warfare, radar intelligence has become increasingly crucial when dealing with complex interference environments. This paper combines radar agile frequency technology with reinforcement learning to achieve adaptive frequency hopping for radar anti-jamming. Unlike traditional reinforcement learning with Markov decision processes (MDPs), the interaction between radar and jammers occurs within the partially observable Markov decision processes (POMDPs). In this context, the partial observation information available to the agent does not strictly satisfy the Markov property. This paper uses multiple layers of historical observation information to solve this problem. Historical observations can be viewed as a time series, and time-sensitive networks are employed to extract the temporal information embedded within the observations. In addition, the reward function is optimized to facilitate the faster learning of the agent in the jammer sweep environment. This simulation shows that the optimization of the agent state, network structure, and reward function can effectively help the radar to resist jamming.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords