Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Jianfeng Huang; Guoqiang Lu; Yi Li; Jiajun Wu

doi:10.3390/math12132001

Mathematics (Jun 2024)

Q-Sorting: An Algorithm for Reinforcement Learning Problems with Multiple Cumulative Constraints

Jianfeng Huang,
Guoqiang Lu,
Yi Li,
Jiajun Wu

Affiliations

Jianfeng Huang: College of Engineering, Shantou University, Shantou 515063, China
Guoqiang Lu: College of Engineering, Shantou University, Shantou 515063, China
Yi Li: College of Engineering, Shantou University, Shantou 515063, China
Jiajun Wu: College of Engineering, Shantou University, Shantou 515063, China

DOI: https://doi.org/10.3390/math12132001
Journal volume & issue: Vol. 12, no. 13
p. 2001

Abstract

Read online

This paper proposes a method and an algorithm called Q-sorting for reinforcement learning (RL) problems with multiple cumulative constraints. The primary contribution is a mechanism for dynamically determining the focus of optimization among multiple cumulative constraints and the objective. Executed actions are picked through a procedure with two steps: first filter out actions potentially breaking the constraints, and second sort the remaining ones according to the Q values of the focus in descending order. The algorithm was originally developed upon the classic tabular value representation and episodic setting of RL, but the idea can be extended and applied to other methods with function approximation and discounted setting. Numerical experiments are carried out on the adapted Gridworld and the motor speed synchronization problem, both with one and two cumulative constraints. Simulation results validate the effectiveness of the proposed Q-sorting in that cumulative constraints are honored both during and after the learning process. The advantages of Q-sorting are further emphasized through comparison with the method of lumped performances (LP), which takes constraints into account through weighting parameters. Q-sorting outperforms LP in both ease of use (unnecessity of trial and error to determine values of the weighting parameters) and performance consistency (6.1920 vs. 54.2635 rad/s for the standard deviation of the cumulative performance index over 10 repeated simulation runs). It has great potential for practical engineering use.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords