工程科学学报 (Jul 2024)

Cooperative encirclement method for multiple unmanned ground vehicles based on reinforcement learning

  • Muqing SU,
  • Yin WANG,
  • Ruimin PU,
  • Meng YU

DOI
https://doi.org/10.13374/j.issn2095-9389.2023.09.15.004
Journal volume & issue
Vol. 46, no. 7
pp. 1237 – 1250

Abstract

Read online

Collaborative encirclement of multiple unmanned ground vehicles (UGVs) is a focal challenge in the realm of multiagent collaborative tasks, representing a fundamental issue in complex undertakings such as multiagent collaborative search and interception. Although optimization algorithms have yielded rich research outcomes in collaborative encirclement, challenges persist, including poor real-time computational efficiency and weak robustness. Reinforcement learning theory holds considerable promise for addressing multiagent sequential decision problems. This paper delves into the study of the collaborative encirclement of multiple UGVs based on deep reinforcement learning theory, focusing on the following key aspects: establishing a kinematic model for UGVs to describe the collaborative encirclement task, detailing the collaborative encirclement process, developing strategies for target UGV escape, and addressing challenges arising from the increasing number of UGVs, which results in a complex environment and issues such as algorithmic instability, dimension explosion, and poor convergence. This paper introduces a collaborative encirclement algorithm based on the soft actor–critic (SAC) framework. To address issues related to poor collaboration and weak generalization among multiple UGVs, long short-term memory is incorporated into the network structure, serving as a memory function for UGVs. This tactic aids in capturing and using information from historical observation sequences, effectively processing time–series data, making more accurate decisions, promoting mutual collaboration among UGVs, and enhancing system stability. To tackle the issue of increased state space dimensions and low training efficiency during collaborative encirclement, an attention mechanism is introduced to calculate and select attention weights in the state space, focusing attention on key states relevant to the task. This strategy helps constrain state space dimensions, ensuring network stability, achieving stable and efficient collaboration among multiple UGVs, and improving algorithm training efficiency. To address the problem of sparse rewards in collaborative encirclement tasks, a mixed reward function is proposed that divides the reward function into individual and collaborative rewards. Individual rewards guide UGVs toward the target, incentivizing their motion behavior, whereas collaborative rewards motivate a group of UGVs to collectively accomplish the encirclement task. This approach further guides UGVs to obtain more frequent reward signals, ultimately enhancing the algorithm convergence speed. Simulation and experimental results demonstrate that the proposed method achieves faster convergence than SAC, with a 15.1% reduction in encirclement time and a 7.6% improvement in success rate. Finally, the improved algorithm developed in this paper is deployed on a UGV platform, and real-world experiments in typical encirclement scenarios validate its feasibility and effectiveness in embedded systems.

Keywords