Learning Bipedal Walking for Humanoids With Current Feedback

Rohan P. singh; Zhaoming Xie; Pierre Gergondet; Fumio Kanehiro

doi:10.1109/ACCESS.2023.3301175

IEEE Access (Jan 2023)

Learning Bipedal Walking for Humanoids With Current Feedback

Rohan P. singh,
Zhaoming Xie,
Pierre Gergondet,
Fumio Kanehiro

Affiliations

Rohan P. singh: ORCiD; CNRS-AIST JRL (Joint Robotics Laboratory), IRL, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
Zhaoming Xie: Department of Computer Science, Stanford University, Stanford, CA, USA
Pierre Gergondet: CNRS-AIST JRL (Joint Robotics Laboratory), IRL, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan
Fumio Kanehiro: ORCiD; CNRS-AIST JRL (Joint Robotics Laboratory), IRL, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki, Japan

DOI: https://doi.org/10.1109/ACCESS.2023.3301175
Journal volume & issue: Vol. 11
pp. 82013 – 82023

Abstract

Read online

Recent advances in deep reinforcement learning (RL) based techniques combined with training in simulation have offered a new approach to developing robust controllers for legged robots. However, the application of such approaches to real hardware has largely been limited to quadrupedal robots with direct-drive actuators and light-weight bipedal robots with low gear-ratio transmission systems. Application to real, life-sized humanoid robots has been less common arguably due to a large sim2real gap. In this paper, we present an approach for effectively overcoming the sim2real gap issue for humanoid robots arising from inaccurate torque-tracking at the actuator level. Our key idea is to utilize the current feedback from the actuators on the real robot, after training the policy in a simulation environment artificially degraded with poor torque-tracking. Our approach successfully trains a unified, end-to-end policy in simulation that can be deployed on a real HRP-5P humanoid robot to achieve bipedal locomotion. Through ablations, we also show that a feedforward policy architecture combined with targeted dynamics randomization is sufficient for zero-shot sim2real success, thus eliminating the need for computationally expensive, memory-based network architectures. Finally, we validate the robustness of the proposed RL policy by comparing its performance against a conventional model-based controller for walking on uneven terrain with the real robot.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords