Frontiers in Neurorobotics (Apr 2013)

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

  • Ken eKinjo,
  • Ken eKinjo,
  • Eiji eUchibe,
  • Kenji eDoya,
  • Kenji eDoya

DOI
https://doi.org/10.3389/fnbot.2013.00007
Journal volume & issue
Vol. 7

Abstract

Read online

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in whichthe Bellman’s equation can be converted into a linear equation by an exponential transformation ofthe state value function (Todorov, 2009). In an LMDP, the optimal value function and the correspondingcontrol policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunctionproblem in a continuous state using the knowledge of the system dynamics and the action, state, andterminal cost functions.In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in whichthe dynamics of the body and the environment have to be learned from experience. We first perform asimulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynam-ics model on the derived the action policy. The result shows that a crude linear approximation of thenonlinear dynamics can still allow solution of the task, despite with a higher total cost.We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robotplatform. The state is given by the position and the size of a battery in its camera view and two neck jointangles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servocontroller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state costfunctions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics modelperformed equivalently with the optimal linear quadratic controller (LQR). In the non-quadratic task, theLMDP controller with a linear dynamics model showed the best performance. The results demonstratethe usefulness of the LMDP framework in real robot control even when simple linear models are usedfor dynamics learning.

Keywords