Frontiers in Computer Science (Jul 2024)

Applying learning-from-observation to household service robots: three task common-sense formulations

  • Katsushi Ikeuchi,
  • Jun Takamatsu,
  • Kazuhiro Sasabuchi,
  • Naoki Wake,
  • Atsushi Kanehira

DOI
https://doi.org/10.3389/fcomp.2024.1235239
Journal volume & issue
Vol. 6

Abstract

Read online

Utilizing a robot in a new application requires the robot to be programmed at each time. To reduce such programmings efforts, we have been developing “Learning-from-observation (LfO)” that automatically generates robot programs by observing human demonstrations. So far, our previous research has been in the industrial domain. From now on, we want to expand the application field to the household-service domain. One of the main issues with introducing this LfO system into the domain is the cluttered environments, which makes it difficult to discern which movements of the human body parts and their relationships with environment objects are crucial for task execution when observing demonstrations. To overcome this issue, it is necessary for the system to have task common-sense shared with the human demonstrator to focus on the demonstrator's specific movements. Here, task common-sense is defined as the movements humans take almost unconsciously to streamline or optimize the execution of a series of tasks. In this paper, we extract and define three types of task common-sense (semi-conscious movements) that should be focused on when observing demonstrations of household tasks and propose representations to describe them. Specifically, the paper proposes to use labanotation to describe the whole-body movements with respect to the environment, contact-webs to describe the hand-finger movements with respect to the tool for grasping, and physical and semantic constraints to describe the movements of the hand with the tool with respect to the environment. Based on these representations, the paper formulates task models, machine-independent robot programs, that indicate what-to-do and where-to-do. In this design process, the necessary and sufficient set of task models to be prepared in the task-model library are determined on the following criteria: for grasping tasks, according to the classification of contact-webs along the purpose of the grasping, and for manipulation tasks, corresponding to possible transitions between states defined by either physical constraints and semantic constraints. The skill-agent library is also prepared to collect skill-agents corresponding to tasks. The skill-agents in the library are pre-trained using reinforcement learning with the reward functions designed based on the physical and semantic constraints to execute the task when specific parameters are provided. Third, the paper explains the task encoder to obtain task models and task decoder to execute the task models on the robot hardware. The task encoder understands what-to-do from the verbal input and retrieves the corresponding task model in the library. Next, based on the knowledge of each task, the system focuses on specific parts of the demonstration to collect where-to-do parameters for executing the task. The decoder constructs a sequence of skill-agents retrieving from the skill-agnet library corresponding and inserts those parameters obtained from the demonstration into these skill-agents, allowing the robot to perform task sequences with following the Labanotation postures. Finally, this paper presents how the system actually works through several example scenes.

Keywords