Output Feedback Control for Deterministic Unknown Dynamics Discrete-Time System Using Deep Recurrent Q-Networks

Adi Novitarini Putri; Egi Hidayat; Dimitri Mahayana; Carmadi Machbub

doi:10.1109/ACCESS.2023.3342201

IEEE Access (Jan 2023)

Output Feedback Control for Deterministic Unknown Dynamics Discrete-Time System Using Deep Recurrent Q-Networks

Adi Novitarini Putri,
Egi Hidayat,
Dimitri Mahayana,
Carmadi Machbub

Affiliations

Adi Novitarini Putri: ORCiD; Control and Computer System Research Group, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
Egi Hidayat: Control and Computer System Research Group, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
Dimitri Mahayana: ORCiD; Control and Computer System Research Group, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia
Carmadi Machbub: Control and Computer System Research Group, School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Bandung, Indonesia

DOI: https://doi.org/10.1109/ACCESS.2023.3342201
Journal volume & issue: Vol. 11
pp. 141559 – 141572

Abstract

Read online

The current application of control theory is commonly carried out in systems with a model or known system dynamics. However, in practice this is a formidable task to achieve as not all state information can be known. The use of the Output Feedback (OPFB) scheme in the field of control systems also possesses a weakness because it requires the use of an observer. This appears rather contradictory as the use of an observer requires system dynamics information. This research proposes an optimal control scheme using Deep Recurrent Q-Networks (DRQN) to generate an optimal control signal trajectory based on a collection of input and output data from the system itself. The approach proposed in this study is based on the Q-Learning method from the Reinforcement Learning (RL) scheme. The Long-Short Term Memory (LSTM) is used to approximate the Q-function and determine the control signals for a system without a known model. The method that we proposed in this study has been tested on four case studies. The control signal trajectory generated from our proposed algorithm, is much smaller than the control signal that generated from classical Q-Learning scheme. The results of this research are certainly relevant to the aim of OPFB, namely that the controller is designed to be able to regulate (bring the state trajectory to zero) and minimize control signal energy. It is empirically discovered that the same result is proven by the norm values resulting from the Q-function trajectory. The norm of Q-function trajectory for our proposed algorithm on the 1st, 2nd, 3rd, and 4th case studies are 2.11E-08, 3.15E-06, 3.79E-09, and 1.59E-13, respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords