Machines (Feb 2023)

Q-Learning with the Variable Box Method: A Case Study to Land a Solid Rocket

  • Alejandro Tevera-Ruiz,
  • Rodolfo Garcia-Rodriguez,
  • Vicente Parra-Vega,
  • Luis Enrique Ramos-Velasco

DOI
https://doi.org/10.3390/machines11020214
Journal volume & issue
Vol. 11, no. 2
p. 214

Abstract

Read online

Some critical tasks require refined actions near the target, for instance, steering a car in a crowded parking lot or landing a rocket. These tasks are critical because failure to comply with the constraints near the target may lead to a fatal (unrecoverable) condition. Thus, a higher resolution action is required near the target to increase maneuvering precision. Moreover, completing the task becomes more challenging if the environment changes or is uncertain. Therefore, novel approaches have been proposed for these problems. In particular, reinforcement learning schemes such as Q-learning have been suggested to learn from scratch, subject to exploring action–state causal relationships aimed at action decisions that lead to an increase in the reward. Q-learning refines iterative action inputs by exploring state spaces that maximize the reward. However, reducing the (constant) resolution box needed for critical tasks increases the computational load, which may lead to the tantamount curse of the dimensionality problem. This paper proposes a variable box method to maintain a low number of boxes but reduce its resolution only near the target to increase action resolution as needed. The proposal is applied to a critical task such as landing a solid rocket, whose dynamics are highly nonlinear, underactuated, non-affine, and subject to environmental disturbances. Simulations show successful landing without leading to a curse of dimensionality, typical of the classical (constant box) Q-learning scheme.

Keywords