IEEE Access (Jan 2021)

Linear Quadratic Tracking With Reinforcement Learning Based Reference Trajectory Optimization for the Lunar Hopper in Simulated Environment

  • Toshiki Tanaka,
  • Heidar Malki,
  • Marzia Cescon

DOI
https://doi.org/10.1109/ACCESS.2021.3134592
Journal volume & issue
Vol. 9
pp. 162973 – 162983

Abstract

Read online

In this work, we provide a novel optimal guidance and control strategy for lunar hopper obstacle-avoidance, descent, and landing problem and demonstrate its behavior using numerical simulations. More specifically, the major contributions of this paper are three-fold: 1) proposed a feedback-based reference trajectory design for lunar hopper guidance, 2) developed the mathematical models and equations of linear quadratic tracking (LQT) controller for lunar hopper control, and 3) developed a method using reinforcement learning to optimize the designed reference trajectory in conjunction with the designed LQT controller, the so- called linear quadratic tracking with reinforcement learning based reference trajectory optimization (LQT-RTO). We demonstrated the LQT-RTO under a 2-dimensional (2D) lunar hopper simulation environment with 1) the LQT with heuristic reference trajectory design (LQT-HTD) and 2) reinforcement learning (reinforcement learning based controller, or RLC). We confirmed by numerical simulation that the LTQ-RTO outperformed the LQT-HTD in terms of fuel consumption, and outperformed the RLC in terms of landing success rate. Lastly, we provided theoretical interpretation to the simulation results.

Keywords