Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation

Xiaoping Zhang; Yuanpeng Zheng; Li Wang; Arsen Abdulali; Fumiya Iida

doi:10.3390/app132111951

Applied Sciences (Nov 2023)

Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation

Xiaoping Zhang,
Yuanpeng Zheng,
Li Wang,
Arsen Abdulali,
Fumiya Iida

Affiliations

Xiaoping Zhang: School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
Yuanpeng Zheng: School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
Li Wang: School of Electrical and Control Engineering, North China University of Technology, Beijing 100144, China
Arsen Abdulali: Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK
Fumiya Iida: Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

DOI: https://doi.org/10.3390/app132111951
Journal volume & issue: Vol. 13, no. 21
p. 11951

Abstract

Read online

Multi-agent collaborative target search is one of the main challenges in the multi-agent field, and deep reinforcement learning (DRL) is a good way to learn such a task. However, DRL always faces the problem of sparse reward, which to some extent reduces its efficiency in task learning. Introducing intrinsic motivation has proved to be a useful way to make the sparse reward in DRL. So, based on the multi-agent deep deterministic policy gradient (MADDPG) structure, a new MADDPG algorithm with the emotional intrinsic motivation name MADDPG-E is proposed in this paper for the multi-agent collaborative target search. In MADDPG-E, a new emotional intrinsic motivation module with three emotions, joy, sadness, and fear, is designed. The three emotions are defined by corresponding psychological knowledge to the multi-agent embodied situations in an environment. An emotional steady-state variable function H is then designed to help judge the goodness of the emotions. Based on H, an emotion-based intrinsic reward function is finally proposed. With the designed emotional intrinsic motivation module, the multi-agent system always tries to make itself joy, which means it always learns to search the target. To show the effectiveness of the proposed MADDPG-E algorithm, two kinds of simulation experiments with a determined initial position and random initial position, respectively, are carried out, and comparisons are performed with MADDPG as well as MADDPG-ICM (MADDPG with an intrinsic curiosity module). The results show that with the designed emotional intrinsic motivation module, MADDPG-E has a higher learning speed and better learning stability, and the advantage is more obvious when facing complex situations.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords