Variance-Based Exploration for Learning Model Predictive Control

Katrine Seel; Alberto Bemporad; Sebastien Gros; Jan Tommy Gravdahl

doi:10.1109/ACCESS.2023.3282842

IEEE Access (Jan 2023)

Variance-Based Exploration for Learning Model Predictive Control

Katrine Seel,
Alberto Bemporad,
Sebastien Gros,
Jan Tommy Gravdahl

Affiliations

Katrine Seel: ORCiD; Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
Alberto Bemporad: ORCiD; IMT School for Advanced Studies, Lucca, Italy
Sebastien Gros: ORCiD; Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
Jan Tommy Gravdahl: ORCiD; Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

DOI: https://doi.org/10.1109/ACCESS.2023.3282842
Journal volume & issue: Vol. 11
pp. 60724 – 60736

Abstract

Read online

The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords