An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Kaifang Wan; Dingwei Wu; Yiwei Zhai; Bo Li; Xiaoguang Gao; Zijian Hu

doi:10.3390/e23111433

Entropy (Oct 2021)

An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning

Kaifang Wan,
Dingwei Wu,
Yiwei Zhai,
Bo Li,
Xiaoguang Gao,
Zijian Hu

Affiliations

Kaifang Wan: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Dingwei Wu: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Yiwei Zhai: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Bo Li: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Xiaoguang Gao: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
Zijian Hu: School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China

DOI: https://doi.org/10.3390/e23111433
Journal volume & issue: Vol. 23, no. 11
p. 1433

Abstract

Read online

A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.

Published in Entropy

ISSN: 1099-4300 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Astronomy: Astrophysics; Science: Physics
Website: http://www.mdpi.com/journal/entropy

About the journal

Abstract

Keywords