Multimodal Engagement Prediction in Multiperson Human&#x2013;Robot Interaction

Ahmed A. Abdelrahman; Dominykas Strazdas; Aly Khalifa; Jan Hintz; Thorsten Hempel; Ayoub Al-Hamadi

doi:10.1109/ACCESS.2022.3182469

IEEE Access (Jan 2022)

Multimodal Engagement Prediction in Multiperson Human–Robot Interaction

Ahmed A. Abdelrahman,
Dominykas Strazdas,
Aly Khalifa,
Jan Hintz,
Thorsten Hempel,
Ayoub Al-Hamadi

Affiliations

Ahmed A. Abdelrahman: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany
Dominykas Strazdas: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany
Aly Khalifa: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany
Jan Hintz: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany
Thorsten Hempel: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany
Ayoub Al-Hamadi: ORCiD; Neuro-Information Technology, Otto-von Guericke University Magdeburg, Magdeburg, Germany

DOI: https://doi.org/10.1109/ACCESS.2022.3182469
Journal volume & issue: Vol. 10
pp. 61980 – 61991

Abstract

Read online

The ability to measure the engagement level of humans interacting with robots paves the way towards intuitive and safe human-robot interaction. Recent approaches achieve reasonable progress in predicting human engagement in physically situated environments. However, engagement estimation is still a challenging problem especially in an open-world environment due to the difficulty of creating and monitoring a variety of human social cues in real-time. Furthermore, the interactions may involve a group of subjects interacting simultaneously with the robot, which increases the prediction complexity. In this paper, we design a real-time engagement estimation system for humans interacting with robots with generalization capability. We propose to estimate engagement using a three-stage approach based on a combination of learning-based and rule-based approaches. Firstly, state-of-the-art deep learning methods are used to extract engagement features from input frames. Then, a simple neural network is used to estimate the focus of attention score by incorporating gaze and head pose features and assigning this score to all subjects in the scene using a face recognition algorithm. Finally, a rule-based classification approach is used to predict the engagement state of the subject to initiate/terminate the interaction with the robot. To effectively evaluate our system, we access our approach for each phase separately. Additionally, we use an online evaluation study in which subjects are allowed to interact freely with an industrial robot. Our model achieves an average of 96%, 90%, and 93% precision, recall, and F-score respectively.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords