Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

Enyu Zhao; Ning Zhou; Chanjuan Liu; Houfu Su; Yang Liu; Jinmiao Cong

doi:10.1007/s40747-024-01389-0

Complex & Intelligent Systems (Mar 2024)

Time-aware MADDPG with LSTM for multi-agent obstacle avoidance: a comparative study

Enyu Zhao,
Ning Zhou,
Chanjuan Liu,
Houfu Su,
Yang Liu,
Jinmiao Cong

Affiliations

Enyu Zhao: Department of Computer Science and Technology, Dalian University of Technology
Ning Zhou: Department of Computer Science and Technology, Dalian University of Technology
Chanjuan Liu: Department of Computer Science and Technology, Dalian University of Technology
Houfu Su: Department of Computer Science and Technology, Dalian University of Technology
Yang Liu: Department of Computer Science and Technology, Dalian University of Technology
Jinmiao Cong: Department of Computer Science and Technology, Dalian University of Technology

DOI: https://doi.org/10.1007/s40747-024-01389-0
Journal volume & issue: Vol. 10, no. 3
pp. 4141 – 4155

Abstract

Read online

Abstract Intelligent agents and multi-agent systems are increasingly used in complex scenarios, such as controlling groups of drones and non-player characters in video games. In these applications, multi-agent navigation and obstacle avoidance are foundational functions. However, problems become more challenging with the increased complexity of the environment and the dynamic decision-making interactions among agents. The Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm is a classical multi-agent reinforcement learning algorithm successfully used to improve agents’ performance. However, it ignores the temporal message hidden in agents’ interaction with the environment and needs to be more efficient in scenarios with many agents due to its training technique. To address the limitations of MADDPG, we propose to explore modified algorithms of MADDPG for multi-agent navigation and obstacle avoidance. By combining MADDPG with Long Short-Term Memory (LSTM), we obtain the MADDPG-LSTMactor algorithm, which leverages continuous observations over time as input for the policy network, enabling the LSTM layer to capture hidden temporal patterns. Moreover, by simplifying the input of the critic network, we obtain the MADDPG-L algorithm for efficiency improvement in scenarios with many agents. Experimental results demonstrate that these algorithms outperform existing networks in the OpenAI multi-agent particle environment. We also conducted a comparative study of the LSTM-based approach with Transformer and self-attention models in the task of multi-agent navigation and obstacle avoidance. The results reveal that Transformer and self-attention do not consistently outperform LSTM. The LSTM-based model exhibits a favorable tradeoff across varying sequence lengths. Overall, this work addresses the limitations of MADDPG in multi-agent navigation and obstacle avoidance tasks, providing insights for developing intelligent agents and multi-agent systems.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords