Optimizing Reinforcement Learning Control Model in Furuta Pendulum and Transferring it to Real-World

Myung Rae Hong; Sanghun Kang; Jingoo Lee; Sungchul Seo; Seungyong Han; Je-Sung Koh; Daeshik Kang

doi:10.1109/ACCESS.2023.3310405

IEEE Access (Jan 2023)

Optimizing Reinforcement Learning Control Model in Furuta Pendulum and Transferring it to Real-World

Myung Rae Hong,
Sanghun Kang,
Jingoo Lee,
Sungchul Seo,
Seungyong Han,
Je-Sung Koh,
Daeshik Kang

Affiliations

Myung Rae Hong: Department of Mechanical Engineering, Multiscale Bio-Inspired Technology Laboratory, Ajou University, Suwon, Republic of Korea
Sanghun Kang: ORCiD; Department of Mechanical Engineering, Multiscale Bio-Inspired Technology Laboratory, Ajou University, Suwon, Republic of Korea
Jingoo Lee: ORCiD; Department of Sustainable Environment Research, Korea Institute of Machinery ad Materials, Daejeon, Republic of Korea
Sungchul Seo: ORCiD; Department of Nano-Chemical, Biological and Environmental Engineering, Seokyeong University, Seoul, Seongbuk, Republic of Korea
Seungyong Han: ORCiD; Department of Mechanical Engineering, Multiscale Bio-Inspired Technology Laboratory, Ajou University, Suwon, Republic of Korea
Je-Sung Koh: ORCiD; Department of Mechanical Engineering, Multiscale Bio-Inspired Technology Laboratory, Ajou University, Suwon, Republic of Korea
Daeshik Kang: ORCiD; Department of Mechanical Engineering, Multiscale Bio-Inspired Technology Laboratory, Ajou University, Suwon, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3310405
Journal volume & issue: Vol. 11
pp. 95195 – 95200

Abstract

Read online

Reinforcement learning does not require explicit robot modeling as it learns on its own based on data, but it has temporal and spatial constraints when transferred to real-world environments. In this research, we trained a balancing Furuta pendulum problem, which is difficult to model, in a virtual environment (Unity) and transferred it to the real world. The challenge of the balancing Furuta pendulum problem is to maintain the pendulum’s end effector in a vertical position. We resolved the temporal and spatial constraints by performing reinforcement learning in a virtual environment. Furthermore, we designed a novel reward function that enabled faster and more stable problem-solving compared to the two existing reward functions. We validate each reward function by applying it to the soft actor-critic (SAC) and proximal policy optimization (PPO). The experimental result shows that cosine reward function is trained faster and more stable. Finally, SAC algorithm model using a cosine reward function in the virtual environment is an optimized controller. Additionally, we evaluated the robustness of this model by transferring it to the real environment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords