Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol

Xiaochen Liu; Sipeng Wang; Xingxing Li; Ze Cui

doi:10.3390/sym16091227

Symmetry (Sep 2024)

Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol

Xiaochen Liu,
Sipeng Wang,
Xingxing Li,
Ze Cui

Affiliations

Xiaochen Liu: School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang 550025, China
Sipeng Wang: School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang 550025, China
Xingxing Li: School of Mechanical and Electrical Engineering, Guizhou Normal University, Guiyang 550025, China
Ze Cui: School of Control Science and Engineering, Dalian University of Technology, Dalian 116024, China

DOI: https://doi.org/10.3390/sym16091227
Journal volume & issue: Vol. 16, no. 9
p. 1227

Abstract

Read online

As a complex nonlinear system, the inverted pendulum (IP) system has the characteristics of asymmetry and instability. In this paper, the IP system is controlled by a learned deep neural network (DNN) that directly maps the system states to control commands in an end-to-end style. On the basis of deep reinforcement learning (DRL), the detail reward function (DRF) is designed to guide the DNN learning control strategy, which greatly enhances the pertinence and flexibility of the control. Moreover, a two-phase learning protocol (offline learning phase and online learning phase) is proposed to solve the “real gap” problem of the IP system. Firstly, the DNN learns the offline control strategy based on a simplified IP dynamic model and DRF. Then, a security controller is designed and used on the IP platform to optimize the DNN online. The experimental results demonstrate that the DNN has good robustness to model errors after secondary learning on the platform. When the length of the pendulum is reduced by 25% or increased by 25%, the steady-state error of the pendulum angle is less than 0.05 rad. The error is within the allowable range. The DNN is robust to changes in the length of the pendulum. The DRF and the two-phase learning protocol improve the adaptability of the controller to the complex and variable characteristics of the real platform and provide reference for other learning-based robot control problems.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords