Q-Learning with the Variable Box Method: A Case Study to Land a Solid Rocket

Alejandro Tevera-Ruiz; Rodolfo Garcia-Rodriguez; Vicente Parra-Vega; Luis Enrique Ramos-Velasco

doi:10.3390/machines11020214

Machines (Feb 2023)

Q-Learning with the Variable Box Method: A Case Study to Land a Solid Rocket

Alejandro Tevera-Ruiz,
Rodolfo Garcia-Rodriguez,
Vicente Parra-Vega,
Luis Enrique Ramos-Velasco

Affiliations

Alejandro Tevera-Ruiz: Robotics and Advanced Manufacturing Department, Research Center for Advanced Studies (CINVESTAV), Ramos Arizpe 25900, Mexico
Rodolfo Garcia-Rodriguez: Aeronautical Engineering Program and Postgraduate Program in Aerospace Engineering, Univ. Politécnica Metropolitana de Hidalgo, Tolcayuca 43860, Mexico
Vicente Parra-Vega: Robotics and Advanced Manufacturing Department, Research Center for Advanced Studies (CINVESTAV), Ramos Arizpe 25900, Mexico
Luis Enrique Ramos-Velasco: Aeronautical Engineering Program and Postgraduate Program in Aerospace Engineering, Univ. Politécnica Metropolitana de Hidalgo, Tolcayuca 43860, Mexico

DOI: https://doi.org/10.3390/machines11020214
Journal volume & issue: Vol. 11, no. 2
p. 214

Abstract

Read online

Some critical tasks require refined actions near the target, for instance, steering a car in a crowded parking lot or landing a rocket. These tasks are critical because failure to comply with the constraints near the target may lead to a fatal (unrecoverable) condition. Thus, a higher resolution action is required near the target to increase maneuvering precision. Moreover, completing the task becomes more challenging if the environment changes or is uncertain. Therefore, novel approaches have been proposed for these problems. In particular, reinforcement learning schemes such as Q-learning have been suggested to learn from scratch, subject to exploring action–state causal relationships aimed at action decisions that lead to an increase in the reward. Q-learning refines iterative action inputs by exploring state spaces that maximize the reward. However, reducing the (constant) resolution box needed for critical tasks increases the computational load, which may lead to the tantamount curse of the dimensionality problem. This paper proposes a variable box method to maintain a low number of boxes but reduce its resolution only near the target to increase action resolution as needed. The proposal is applied to a critical task such as landing a solid rocket, whose dynamics are highly nonlinear, underactuated, non-affine, and subject to environmental disturbances. Simulations show successful landing without leading to a curse of dimensionality, typical of the classical (constant box) Q-learning scheme.

Published in Machines

ISSN: 2075-1702 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Mechanical engineering and machinery
Website: http://www.mdpi.com/journal/machines

About the journal

Abstract

Keywords