Moor: Model-based offline policy optimization with a risk dynamics model

Xiaolong Su; Peng Li; Shaofei Chen

doi:10.1007/s40747-024-01621-x

Complex & Intelligent Systems (Nov 2024)

Moor: Model-based offline policy optimization with a risk dynamics model

Xiaolong Su,
Peng Li,
Shaofei Chen

Affiliations

Xiaolong Su: College of Intelligence Science and Technology, National University of Defense Technology
Peng Li: College of Intelligence Science and Technology, National University of Defense Technology
Shaofei Chen: College of Intelligence Science and Technology, National University of Defense Technology

DOI: https://doi.org/10.1007/s40747-024-01621-x
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aversion. However, current model-based approaches only extract state transition information and reward information using dynamics models, which cannot capture risk information implicit in offline data and may result in the misuse of high-risk data. In this work, we propose a model-based offline policy optimization approach with a risk dynamics model (MOOR). Specifically, we construct a risk dynamics model using a quantile network that can learn the risk information of data, then we reshape model-generated data based on errors of the risk dynamics model and the risk information of data. Finally, we use a risk-averse algorithm to learn the policy on the combined dataset of offline and generated data. We theoretically prove that MOOR can identify risk information of data and avoid utilizing high-risk data, our experiments show that MOOR outperforms existing approaches and achieves state-of-the-art results in risk-sensitive D4RL and risky navigation tasks.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords