Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization

Jung-Hyun Kim; Yong-Hoon Choi; You-Rak Choi; Jae-Hyeok Jeong; Min-Suk Kim

doi:10.3390/app15041828

Applied Sciences (Feb 2025)

Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization

Jung-Hyun Kim,
Yong-Hoon Choi,
You-Rak Choi,
Jae-Hyeok Jeong,
Min-Suk Kim

Affiliations

Jung-Hyun Kim: Department of Electronic Information System Engineering, Sangmyung University, Cheonan 31066, Republic of Korea
Yong-Hoon Choi: Department of Computer Information and Communication Engineering, Sangmyung University, Cheonan 31066, Republic of Korea
You-Rak Choi: Nuclear System Integrity Sensing and Diagnosis Division, Korea Atomic Energy Research Institute, Daejeon 34057, Republic of Korea
Jae-Hyeok Jeong: Department of Electronic Information System Engineering, Sangmyung University, Cheonan 31066, Republic of Korea
Min-Suk Kim: Department of Computer Information and Communication Engineering, Sangmyung University, Cheonan 31066, Republic of Korea

DOI: https://doi.org/10.3390/app15041828
Journal volume & issue: Vol. 15, no. 4
p. 1828

Abstract

Read online

Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based on Proportional-Integral-Derivative models is intuitive, efficient, and provides outstanding control performance. However, challenges in tracking persist, which requires active research and development to integrate and optimize the control system in terms of Machine Learning. Specifically, Reinforcement Learning, a branch of Machine Learning, has been used in several research fields to solve optimal control problems. In this paper, we propose an Extended Maximum Actor–Critic using a Reinforcement Learning-based method to combine the advantages of both value and policy to enhance the learning stability of actor–critic for optimization of system control. The proposed method integrates the actor and the maximized actor in the learning process to evaluate and identify actions with the highest value, facilitating effective learning exploration. Additionally, to enhance the efficiency and robustness of the agent learning process, we propose Prioritized Hindsight Experience Replay, combining the advantages of Prioritized Experience Replay and Hindsight Experience Replay. To verify this, we performed evaluations and experiments to examine the improved training stability in the MuJoCo environment, which is a simulator based on Reinforcement Learning. The proposed Prioritized Hindsight Experience Replay method significantly enhances the experience to be compared with the standard replay buffer and PER in experimental simulators, such as the simple HalfCheetah-v4 and the complex Ant-v4. Thus, Prioritized Hindsight Experience Replay achieves a higher success rate than PER in FetchReach-v2, demonstrating the significant effectiveness of our proposed method in more complex reward environments.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords