Applied Sciences (Feb 2025)
Extended Maximum Actor–Critic Framework Based on Policy Gradient Reinforcement for System Optimization
Abstract
Recently, significant research efforts have been directed toward leveraging Artificial Intelligence for sensor data processing and system control. In particular, it is essential to determine the optimal path and trajectory by calculating sensor data for effective control systems. For instance, model-predictive control based on Proportional-Integral-Derivative models is intuitive, efficient, and provides outstanding control performance. However, challenges in tracking persist, which requires active research and development to integrate and optimize the control system in terms of Machine Learning. Specifically, Reinforcement Learning, a branch of Machine Learning, has been used in several research fields to solve optimal control problems. In this paper, we propose an Extended Maximum Actor–Critic using a Reinforcement Learning-based method to combine the advantages of both value and policy to enhance the learning stability of actor–critic for optimization of system control. The proposed method integrates the actor and the maximized actor in the learning process to evaluate and identify actions with the highest value, facilitating effective learning exploration. Additionally, to enhance the efficiency and robustness of the agent learning process, we propose Prioritized Hindsight Experience Replay, combining the advantages of Prioritized Experience Replay and Hindsight Experience Replay. To verify this, we performed evaluations and experiments to examine the improved training stability in the MuJoCo environment, which is a simulator based on Reinforcement Learning. The proposed Prioritized Hindsight Experience Replay method significantly enhances the experience to be compared with the standard replay buffer and PER in experimental simulators, such as the simple HalfCheetah-v4 and the complex Ant-v4. Thus, Prioritized Hindsight Experience Replay achieves a higher success rate than PER in FetchReach-v2, demonstrating the significant effectiveness of our proposed method in more complex reward environments.
Keywords