Learning the continuous-time optimal decision law from discrete-time rewards

Chen Ci; Xie Lihua; Xie Kan; Lewis Frank Leroy; Liu Yilu; Xie Shengli

doi:10.1360/nso/20230054

National Science Open (Mar 2024)

Learning the continuous-time optimal decision law from discrete-time rewards

Chen Ci,
Xie Lihua,
Xie Kan,
Lewis Frank Leroy,
Liu Yilu,
Xie Shengli

Affiliations

Chen Ci: ["School of Automation, Guangdong University of Technology, Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China","Key Laboratory of Intelligent Information Processing and System Integration of IoT, Ministry of Education, Guangzhou 510006, China"]
Xie Lihua: ["School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore"]
Xie Kan: ["School of Automation, Guangdong University of Technology, Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China","111 Center for Intelligent Batch Manufacturing Based on IoT Technology, Guangzhou 510006, China"]
Lewis Frank Leroy: ["UTA Research Institute, the University of Texas at Arlington, Fort Worth 76118, USA"]
Liu Yilu: ["Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville 37996, USA","Oak Ridge National Laboratory, Oak Ridge 37830, USA"]
Xie Shengli: ["School of Automation, Guangdong University of Technology, Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China","Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou 510006, China"]

DOI: https://doi.org/10.1360/nso/20230054
Journal volume & issue: Vol. 3

Abstract

Read online

The concept of reward is fundamental in reinforcement learning with a wide range of applications in natural and social sciences. Seeking an interpretable reward for decision-making that largely shapes the system's behavior has always been a challenge in reinforcement learning. In this work, we explore a discrete-time reward for reinforcement learning in continuous time and action spaces that represent many phenomena captured by applying physical laws. We find that the discrete-time reward leads to the extraction of the unique continuous-time decision law and improved computational efficiency by dropping the integrator operator that appears in classical results with integral rewards. We apply this finding to solve output-feedback design problems in power systems. The results reveal that our approach removes an intermediate stage of identifying dynamical models. Our work suggests that the discrete-time reward is efficient in search of the desired decision law, which provides a computational tool to understand and modify the behavior of large-scale engineering systems using the optimal learned decision.

Published in National Science Open

ISSN: 2097-1168 (Print); 2097-1400 (Online)
Publisher: Science Press
Country of publisher: China
LCC subjects: Science; Technology: Engineering (General). Civil engineering (General)
Website: https://www.sciengine.com/NSO/home

About the journal

Abstract

Keywords