Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning

Shunya Kidera; Kosuke Shintani; Toi Tsuneda; Satoshi Yamane

doi:10.1109/ACCESS.2024.3361030

IEEE Access (Jan 2024)

Combined Constraint on Behavior Cloning and Discriminator in Offline Reinforcement Learning

Shunya Kidera,
Kosuke Shintani,
Toi Tsuneda,
Satoshi Yamane

Affiliations

Shunya Kidera: ORCiD; Electrical Engineering Department, Kanazawa University, Kanazawa, Japan
Kosuke Shintani: Electrical Engineering Department, Kanazawa University, Kanazawa, Japan
Toi Tsuneda: ORCiD; Electrical Engineering Department, Kanazawa University, Kanazawa, Japan
Satoshi Yamane: ORCiD; Electrical Engineering Department, Kanazawa University, Kanazawa, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3361030
Journal volume & issue: Vol. 12
pp. 19942 – 19951

Abstract

Read online

In recent years, reinforcement learning (RL) has received a lot of attention because we can automatically learn optimal behavioral policies. However, since RL acquires the policy by repeatedly interacting with the environment, it is difficult to learn about realistic tasks. In recent years, there has been a lot of research on offline RL (batch RL), which does not need to interact with the environment, but learns from the accumulated experience prepared in advance. Learning does not work by applying common RL methods directly to offline RL because of a problem called distributional shift. Methods to suppress distributional shift have been actively studied in offline RL. In this study, we propose a new offline RL algorithm that adds constraints from discriminators used in Generative Adversarial Networks to the offline RL method called TD3+BC. We compare and validate the proposed method with existing methods using a benchmark for 3D robot control simulation. In TD3+BC, the constraint was tightened to suppress distribution shift, but a challenge arose when the quality of the dataset was poor, leading to difficulties in successful learning. The proposed approach addresses this issue by incorporating features to mitigate distribution shift while introducing new constraints to enable learning that is not solely dependent on the dataset’s quality. This innovative strategy aims to improve accuracy even in cases where the dataset exhibits poor characteristics.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords