Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum

Hailin Hu; Yuhui Chen; Tao Wang; Fu Feng; Weijin Chen

doi:10.3390/app13137594

Applied Sciences (Jun 2023)

Research on the Deep Deterministic Policy Algorithm Based on the First-Order Inverted Pendulum

Hailin Hu,
Yuhui Chen,
Tao Wang,
Fu Feng,
Weijin Chen

Affiliations

Hailin Hu: School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
Yuhui Chen: School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
Tao Wang: School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
Fu Feng: School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China
Weijin Chen: School of Electrical Engineering and Automation, Jiangxi University of Science and Technology, Ganzhou 341000, China

DOI: https://doi.org/10.3390/app13137594
Journal volume & issue: Vol. 13, no. 13
p. 7594

Abstract

Read online

With the mature development of artificial intelligence technology, the application of intelligent control algorithms in control systems has become a trend to meet the high-performance requirements of modern society. This paper proposes a deep deterministic policy gradient (DDPG) controller design method based on deep reinforcement learning to improve system control performance. Firstly, the optimal control policy of the DDPG algorithm is derived from the Markov decision process and the Actor–Critic algorithm. Secondly, in order to avoid local optima in traditional control systems, the capacity and the settlement method of the DDPG experience pool are adjusted to absorb positive experience to accelerate convergence and to complete efficient training. In response, and to solve the overestimation of the Q value in DDPG, the overall structure of the Critic network is changed to shorten the convergence period of DDPG at low learning rates. Finally, a first-order inverted pendulum control system was constructed in a simulation environment to verify the control effectiveness of PID, DDPG, and improved DDPG. The simulation results reveal that the improved DDPG controller has a faster response to disturbances, smaller displacement, and angular displacement of the first-order inverted pendulum. The simulation further proves that the improved DDPG algorithm has better stability and convergence and stronger anti-interference ability and stability recovery. This control method provides a certain reference for the application of reinforcement learning in traditional control systems.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords