Deep Reinforcement Learning for Target Searching in Cognitive Electronic Warfare

Shixun You; Ming Diao; Lipeng Gao

doi:10.1109/ACCESS.2019.2905649

IEEE Access (Jan 2019)

Deep Reinforcement Learning for Target Searching in Cognitive Electronic Warfare

Shixun You,
Ming Diao,
Lipeng Gao

Affiliations

Shixun You: College of Information and Communication, Harbin Engineering University, Harbin, China
Ming Diao: College of Information and Communication, Harbin Engineering University, Harbin, China
Lipeng Gao: ORCiD; College of Information and Communication, Harbin Engineering University, Harbin, China

DOI: https://doi.org/10.1109/ACCESS.2019.2905649
Journal volume & issue: Vol. 7
pp. 37432 – 37447

Abstract

Read online

The recent appreciation of deep reinforcement learning (DRL) arises from its successes in many domains, but the applications of DRL in practical engineering are still unsatisfactory, including optimizing control strategies in cognitive electronic warfare (CEW). CEW is a massive and challenging project, and due to the sensitivity of the data sources, there are few open studies that have investigated CEW. Moreover, the spatial sparsity, continuous action, and partially observable environment that exist in CEW have greatly limited the abilities of DRL algorithms, which strongly depend on state-value and action-value functions. In this paper, we use Python to build a 3-D space game named Explorer to simulate various CEW environments in which the electronic attacker is an unmanned combat air vehicle (UCAV) and the defender is an observation station, both of which are equipped with radar as the observation sensor. In our game, the UCAV needs to accomplish the task of detecting the target as early as possible to perform follow-up tracking and guidance tasks. To allow an "infant" UCAV to understand what "target searching" is, we train the UCAV's maneuvering strategies by means of a well-designed reward shaping, a simplified constant accelerated motion control, and a deep deterministic policy gradient (DDPG) algorithm based on a generative model and variational Bayesian estimation. The experimental results show that when the operating cycle is 0.2 s, the search success rate of the trained UCAV in 10000 episodes is improved by 33.36% compared with the benchmark, and the target destruction rate is similarly improved by 57.84%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords