Self-improving Q-learning based controller for a class of dynamical processes

Jakub Musial; Krzysztof Stebel; Jacek Czeczot

doi:10.24425/acs.2021.138691

Archives of Control Sciences (Sep 2021)

Self-improving Q-learning based controller for a class of dynamical processes

Jakub Musial,
Krzysztof Stebel,
Jacek Czeczot

Affiliations

Jakub Musial: Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, 44-100 Gliwice, ul. Akademicka 16, Poland
Krzysztof Stebel: Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, 44-100 Gliwice, ul. Akademicka 16, Poland
Jacek Czeczot: Silesian University of Technology, Faculty of Automatic Control, Electronics and Computer Science, Department of Automatic Control and Robotics, 44-100 Gliwice, ul. Akademicka 16, Poland

DOI: https://doi.org/10.24425/acs.2021.138691

Abstract

Read online

This paper presents how Q-learning algorithm can be applied as a general-purpose selfimproving controller for use in industrial automation as a substitute for conventional PI controller implemented without proper tuning. Traditional Q-learning approach is redefined to better fit the applications in practical control loops, including new definition of the goal state by the closed loop reference trajectory and discretization of state space and accessible actions (manipulating variables). Properties of Q-learning algorithm are investigated in terms of practical applicability with a special emphasis on initializing of Q-matrix based only on preliminary PI tunings to ensure bumpless switching between existing controller and replacing Q-learning algorithm. A general approach for design of Q-matrix and learning policy is suggested and the concept is systematically validated by simulation in the application to control two examples of processes exhibiting first order dynamics and oscillatory second order dynamics. Results show that online learning using interaction with controlled process is possible and it ensures significant improvement in control performance compared to arbitrarily tuned PI controller.

Published in Archives of Control Sciences

ISSN: 1230-2384 (Print); 2300-2611 (Online)
Publisher: Polish Academy of Sciences
Country of publisher: Poland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics
Website: http://journals.pan.pl/acs

About the journal

Abstract

Keywords