National Science Open (Mar 2024)

Learning the continuous-time optimal decision law from discrete-time rewards

  • Chen Ci,
  • Xie Lihua,
  • Xie Kan,
  • Lewis Frank Leroy,
  • Liu Yilu,
  • Xie Shengli

DOI
https://doi.org/10.1360/nso/20230054
Journal volume & issue
Vol. 3

Abstract

Read online

The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences. Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning. In this work, we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws. We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards. We apply this finding to solve output-feedback design problems in power systems. The results reveal that our approach removes an intermediate stage of identifying dynamical models. Our work suggests that the discrete-time reward is efficient in search of the desired decision law, which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.

Keywords