CAGE: A Curiosity-Driven Graph-Based Explore-Exploit Algorithm for Solving Deterministic Environment MDPs With Limited Episode Problem

Yide Yu; Yue Liu; Dennis Wong; Huijie Li; Jose Vicente Egas-Lopez; Yan Ma

doi:10.1109/ACCESS.2024.3468027

IEEE Access (Jan 2024)

CAGE: A Curiosity-Driven Graph-Based Explore-Exploit Algorithm for Solving Deterministic Environment MDPs With Limited Episode Problem

Yide Yu,
Yue Liu,
Dennis Wong,
Huijie Li,
Jose Vicente Egas-Lopez,
Yan Ma

Affiliations

Yide Yu: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Yue Liu: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Dennis Wong: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Huijie Li: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China
Jose Vicente Egas-Lopez: ORCiD; Artificial Intelligence Research Group, University of Szeged, Szeged, Hungary
Yan Ma: ORCiD; Faculty of Applied Sciences, Macao Polytechnic University, Macau, SAR, China

DOI: https://doi.org/10.1109/ACCESS.2024.3468027
Journal volume & issue: Vol. 12
pp. 144106 – 144121

Abstract

Read online

The explore-exploit dilemma in Markov Decision Processes (MDPs) is a fundamental challenge, especially in deterministic environments akin to real-world scenarios. Balancing exploration and exploitation within limited episodes is crucial to optimize decision-making. Despite existing research, challenges like parameter sensitivity, lack of global optimality, and inefficient exploration of low-value regions remain. We introduce the Curiosity-driven Algorithm based on Graph for Exploration (CAGE), which addresses these issues through a graph-based framework. CAGE includes two variants: CAGE-greedy, ensuring optimal solutions with ample episodes, and CAGE-centrality, prioritizing significant states in limited episodes. Key contributions include eliminating parameter sensitivity, guaranteeing global optimality, and enhancing exploration efficiency. To validate the performance of the CAGE algorithm series, we design a grid world experiment. The experimental results demonstrate that the CAGE algorithm outperforms a comparative algorithm, indicating its feasibility for implementation in the industry and its high level of explainability. Experimental results validate CAGE’s effectiveness in complex environments.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords