Decomposing user-defined tasks in a reinforcement learning setup using TextWorld

Thanos Petsanis; Christoforos Keroglou; Athanasios Ch. Kapoutsis; Elias B. Kosmatopoulos; Georgios Ch. Sirakoulis

doi:10.3389/frobt.2023.1280578

Frontiers in Robotics and AI (Dec 2023)

Decomposing user-defined tasks in a reinforcement learning setup using TextWorld

Thanos Petsanis,
Christoforos Keroglou,
Athanasios Ch. Kapoutsis,
Elias B. Kosmatopoulos,
Georgios Ch. Sirakoulis

Affiliations

Thanos Petsanis: School of Engineering, Department of Electrical and Computer Engineering, Democritus University of Thrace (DUTH), Xanthi, Greece
Christoforos Keroglou: School of Engineering, Department of Electrical and Computer Engineering, Democritus University of Thrace (DUTH), Xanthi, Greece
Athanasios Ch. Kapoutsis: The Centre for Research and Technology, Information Technologies Institute, Thessaloniki, Greece
Elias B. Kosmatopoulos: School of Engineering, Department of Electrical and Computer Engineering, Democritus University of Thrace (DUTH), Xanthi, Greece
Georgios Ch. Sirakoulis: School of Engineering, Department of Electrical and Computer Engineering, Democritus University of Thrace (DUTH), Xanthi, Greece

DOI: https://doi.org/10.3389/frobt.2023.1280578
Journal volume & issue: Vol. 10

Abstract

Read online

The current paper proposes a hierarchical reinforcement learning (HRL) method to decompose a complex task into simpler sub-tasks and leverage those to improve the training of an autonomous agent in a simulated environment. For practical reasons (i.e., illustrating purposes, easy implementation, user-friendly interface, and useful functionalities), we employ two Python frameworks called TextWorld and MiniGrid. MiniGrid functions as a 2D simulated representation of the real environment, while TextWorld functions as a high-level abstraction of this simulated environment. Training on this abstraction disentangles manipulation from navigation actions and allows us to design a dense reward function instead of a sparse reward function for the lower-level environment, which, as we show, improves the performance of training. Formal methods are utilized throughout the paper to establish that our algorithm is not prevented from deriving solutions.

Published in Frontiers in Robotics and AI

ISSN: 2296-9144 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Technology: Mechanical engineering and machinery; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/robotics-and-ai

About the journal

Abstract

Keywords