IEEE Access (Jan 2024)

Path Planning of Mobile Robot in Dynamic Obstacle Avoidance Environment Based on Deep Reinforcement Learning

  • Qingfeng Zhang,
  • Wenpeng Ma,
  • Qingchun Zheng,
  • Xiaofan Zhai,
  • Wenqian Zhang,
  • Tianchang Zhang,
  • Shuo Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3507016
Journal volume & issue
Vol. 12
pp. 189136 – 189152

Abstract

Read online

In this study, to address the issues faced by mobile robots in complex environments, such as sparse rewards caused by limited effective experience, slow learning efficiency in the early stages of training, as well as poor obstacle avoidance performance in environments with dynamic obstacles, the authors proposed a new path planning algorithm for mobile robots by introducing Intrinsic Curiosity Module (ICM) and Long Short-Term Memory (LSTM) into the Proximal Policy Optimization (PPO) algorithm. ICM provided intrinsic rewards in addition to external rewards, accelerating the initial convergence speed. And the Actor-Critic network was optimized by employing a LSTM-based neural network to enhance the performance of avoiding dynamic obstacles. Then, various simulation experiments were conducted in Gazebo with scenarios featuring both static and dynamic obstacles, and the TurtleBot3 mobile robot was used for experimental verification. The experiments demonstrate that the proposed algorithm significantly accelerates convergence in environments with sparse rewards compared to traditional algorithms, and the robot can find more target points within a single episode, indicating more effective path planning ability. Additionally, it can avoid obstacles in various states. Finally, the effectiveness of the algorithm was validated using the TurtleBot3 physical robot in real-world scenarios. Results show that compared to Deep Deterministic Policy Gradient (DDPG), PPO, LSTM-PPO and ICM-PPO, the success rate of the proposed algorithm in path planning increases by 9%, 7.2%, 4% and 4.8% in the most complex simulation environment and by 20%, 16%, 10% and 12% in the physical environment, respectively.

Keywords