Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Dongfang Zhao; Xu Huanshi; Zhang Xun

doi:10.1007/s44196-023-00389-1

International Journal of Computational Intelligence Systems (Jan 2024)

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

Dongfang Zhao,
Xu Huanshi,
Zhang Xun

Affiliations

Dongfang Zhao: School of Computer science and Engineering, Beijing Technology and Business University
Xu Huanshi: School of Computer science and Engineering, Beijing Technology and Business University
Zhang Xun: School of Computer science and Engineering, Beijing Technology and Business University

DOI: https://doi.org/10.1007/s44196-023-00389-1
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The application of reinforcement learning (RL) to the field of autonomous robotics has high requirements about sample efficiency, since the agent expends for interaction with the environment. One method for sample efficiency is to extract knowledge from existing samples and used to exploration. Typical RL algorithms achieve exploration using task-specific knowledge or adding exploration noise. These methods are limited to current policy improvement level and lack of long-term planning. We propose a novel active exploration deep RL algorithm for the continuous action space problem named active exploration deep reinforcement learning (AEDRL). Our method uses the Gaussian process to model dynamic model, enabling the probability description of prediction sample. Action selection is formulated as the solution of the optimization problem. Thus, the optimization objective is specifically designed for selecting samples that can minimize the uncertainty of the dynamic model. Active exploration is achieved through long-term optimized action selection. This long-term considered action exploration method is more guidance for learning. Enable intelligent agents to explore more interesting action spaces. The proposed AEDRL algorithm is evaluated on several robotic control task including classic pendulum problem and five complex articulated robots. The AEDRL can learn a controller using fewer episodes and demonstrates performance and sample efficiency.

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords