Applied Sciences (Sep 2024)

Dual-Layer Reinforcement Learning for Quadruped Robot Locomotion and Speed Control in Complex Environments

  • Yilin Zhang,
  • Jiayu Zeng,
  • Huimin Sun,
  • Honglin Sun,
  • Kenji Hashimoto

DOI
https://doi.org/10.3390/app14198697
Journal volume & issue
Vol. 14, no. 19
p. 8697

Abstract

Read online

Walking robots have been widely applied in complex terrains due to their good terrain adaptability and trafficability. However, in some environments (such as disaster relief, field navigation, etc.), although a single strategy can adapt to various environments, it is unable to strike a balance between speed and stability. Existing control schemes like model predictive control (MPC) and traditional incremental control can manage certain environments. However, they often cannot balance speed and stability well. These methods usually rely on a single strategy and lack adaptability for dynamic adjustment to different terrains. To address this limitation, this paper proposes an innovative double-layer reinforcement learning algorithm. This algorithm combines Deep Double Q-Network (DDQN) and Proximal Policy Optimization (PPO), leveraging their complementary strengths to achieve both fast adaptation and high stability in complex terrains. This algorithm utilizes terrain information and the robot’s state as observations, determines the walking speed command of the quadruped robot Unitree Go1 through DDQN, and dynamically adjusts the current walking speed in complex terrains based on the robot action control system of PPO. The speed command serves as a crucial link between the robot’s perception and movement, guiding how fast the robot should walk depending on the environment and its internal state. By using DDQN, the algorithm ensures that the robot can set an appropriate speed based on what it observes, such as changes in terrain or obstacles. PPO then executes this speed, allowing the robot to navigate in real time over difficult or uneven surfaces, ensuring smooth and stable movement. Then, the proposed model is verified in detail in Isaac Gym. Wecompare the distances walked by the robot using six different control methods within 10 s. The experimental results indicate that the method proposed in this paper demonstrates excellent speed adjustment ability in complex terrains. On the designed test route, the quadruped robot Unitree Go1 can not only maintain a high walking speed but also maintain a high degree of stability when switching between different terrains. Ouralgorithm helps the robot walk 25.5 m in 10 s, outperforming other methods.

Keywords