Dual-Layer Reinforcement Learning for Quadruped Robot Locomotion and Speed Control in Complex Environments

Yilin Zhang; Jiayu Zeng; Huimin Sun; Honglin Sun; Kenji Hashimoto

doi:10.3390/app14198697

Applied Sciences (Sep 2024)

Dual-Layer Reinforcement Learning for Quadruped Robot Locomotion and Speed Control in Complex Environments

Yilin Zhang,
Jiayu Zeng,
Huimin Sun,
Honglin Sun,
Kenji Hashimoto

Affiliations

Yilin Zhang: Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
Jiayu Zeng: Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
Huimin Sun: Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
Honglin Sun: Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan
Kenji Hashimoto: Graduate School of Information, Production and Systems, Waseda University, Kitakyushu 808-0135, Japan

DOI: https://doi.org/10.3390/app14198697
Journal volume & issue: Vol. 14, no. 19
p. 8697

Abstract

Read online

Walking robots have been widely applied in complex terrains due to their good terrain adaptability and trafficability. However, in some environments (such as disaster relief, field navigation, etc.), although a single strategy can adapt to various environments, it is unable to strike a balance between speed and stability. Existing control schemes like model predictive control (MPC) and traditional incremental control can manage certain environments. However, they often cannot balance speed and stability well. These methods usually rely on a single strategy and lack adaptability for dynamic adjustment to different terrains. To address this limitation, this paper proposes an innovative double-layer reinforcement learning algorithm. This algorithm combines Deep Double Q-Network (DDQN) and Proximal Policy Optimization (PPO), leveraging their complementary strengths to achieve both fast adaptation and high stability in complex terrains. This algorithm utilizes terrain information and the robot’s state as observations, determines the walking speed command of the quadruped robot Unitree Go1 through DDQN, and dynamically adjusts the current walking speed in complex terrains based on the robot action control system of PPO. The speed command serves as a crucial link between the robot’s perception and movement, guiding how fast the robot should walk depending on the environment and its internal state. By using DDQN, the algorithm ensures that the robot can set an appropriate speed based on what it observes, such as changes in terrain or obstacles. PPO then executes this speed, allowing the robot to navigate in real time over difficult or uneven surfaces, ensuring smooth and stable movement. Then, the proposed model is verified in detail in Isaac Gym. Wecompare the distances walked by the robot using six different control methods within 10 s. The experimental results indicate that the method proposed in this paper demonstrates excellent speed adjustment ability in complex terrains. On the designed test route, the quadruped robot Unitree Go1 can not only maintain a high walking speed but also maintain a high degree of stability when switching between different terrains. Ouralgorithm helps the robot walk 25.5 m in 10 s, outperforming other methods.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords