Informatika (Sep 2024)

Development of an imitation learning method for a neural network system of mobile robot’s movement on example of the maze solving

  • T. Yu. Kim,
  • R. A. Prakapovich

DOI
https://doi.org/10.37661/1816-0301-2024-21-3-48-62
Journal volume & issue
Vol. 21, no. 3
pp. 48 – 62

Abstract

Read online

Objectives. To develop a new method for training a mobile robot control system to use a maze solver algorithm based on reinforcement learning and the right-hand algorithm.Methods. The work uses the method of computer modeling in the MATLAB/Simulink environment.Results. A new method for training a mobile robot control system capable of implementing the right-hand algorithm for finding an exit from a maze is proposed. The proposed method is based on the work of two agents interacting with each other: the first directly implements the search algorithm and searches for an exit from the maze, and the second, following it, tries to learn using the imitation learning method. The expert agent, implementing a discrete algorithm for moving through the maze, makes precise discrete steps and moves almost independently of the second agent. The only limitation is its speed, which is directly proportional to the distance between the agents. The second agent, the student agent, tries to reduce the distance to the first agent by trial and error. The learning process was implemented using the reinforcement learning method, which was used in the imitation mode and for which a corresponding reward function was developed, allowing the robot's center of mass to be kept in the center of the corridor and, if necessary, to turn, following the expert agent. The agents move along a virtual polygon consisting of branched corridors wide enough to implement various movement maneuvers.Conclusion. It was proven that, thanks to the proposed method of imitative learning, the student agent is able not only to adopt the required behavior patterns from the expert agent – to search for an exit in a previously unknown labyrinth using the right-hand algorithm, but also to independently acquire new ones (changing speed on a turn, bypassing small dead-end corridors), which positively influence the performance of the assigned task.

Keywords