IEEE Open Journal of Control Systems (Jan 2022)
Reinforcement Learning With Safety and Stability Guarantees During Exploration For Linear Systems
Abstract
The satisfaction of the safety and stability properties of reinforcement learning (RL) algorithms has been a long-standing challenge. These properties must be satisfied even during learning, for which exploration is required to collect rich data. However, satisfying the safety of actions when little is known about the system dynamics is a daunting challenge. After all, predicting the consequence of RL actions requires knowing the system dynamics. This paper presents a novel RL scheme that ensures the safety and stability of the linear systems during the exploration and exploitation phases. To do so, a fast and data-efficient model-learning with the convergence guarantee is employed along and simultaneously with an off-policy RL scheme to find the optimal controller. The accurate bound of the model-learning error is derived and its characteristic is employed in the formation of a novel adaptive robustified control barrier function (ARCBF) which guarantees that states of the system remain in the safe set even when the learning is incomplete. Therefore, after satisfaction of a mild rank condition, the noisy input in the exploratory data collection phase and the optimal controller in the exploitation phase are minimally altered such that the ARCBF criterion is satisfied and, therefore, safety is guaranteed in both phases. It is shown that under the proposed RL framework, the model learning error is a vanishing perturbation to the original system. Therefore, a stability guarantee is also provided even in the exploration when noisy random inputs are applied to the system.
Keywords