Selective imitation for efficient online reinforcement learning with pre-collected data

Chanin Eom; Dongsu Lee; Minhae Kwon

ICT Express (Dec 2024)

Selective imitation for efficient online reinforcement learning with pre-collected data

Chanin Eom,
Dongsu Lee,
Minhae Kwon

Affiliations

Chanin Eom: Department of Intelligent Semiconductors, Soongsil University, Seoul 06978, Republic of Korea
Dongsu Lee: Department of Intelligent Semiconductors, Soongsil University, Seoul 06978, Republic of Korea
Minhae Kwon: Department of Intelligent Semiconductors, Soongsil University, Seoul 06978, Republic of Korea; School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea; Corresponding author at: Department of Intelligent Semiconductorsand School of Electronic Engineering, Soongsil University, Seoul 06978, Republic of Korea.

Journal volume & issue: Vol. 10, no. 6
pp. 1308 – 1314

Abstract

Read online

Deep reinforcement learning (RL) has emerged as a promising solution for autonomous devices requiring sequential decision-making. In the online RL framework, the agent must interact with the environment to collect data, making sample efficiency the most challenging aspect. While the off-policy method in online RL partially addresses this issue by employing a replay buffer, learning speed remains slow, particularly at the beginning of training, due to the low quality of data collected with the initial policy. To overcome this challenge, we propose Reward-Adaptive Pre-collected Data RL (RAPD-RL), which leverages pre-collected data in addition to online RL. We employ two buffers: one for pre-collected data and another for online collected data. The policy is trained using both buffers to increase the Q objective and imitate the actions in the dataset. To maintain resistance to poor-quality (i.e., low-reward) data, our method selectively imitates data based on reward information, thereby enhancing sample efficiency and learning speed. Simulation results demonstrate that the proposed solution converges rapidly and achieves high performance across various dataset qualities.

Published in ICT Express

ISSN: 2405-9595 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.journals.elsevier.com/ict-express/

About the journal

Abstract

Keywords