DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation With Application to e-Learning

Roberto Daza; Luis F. Gomez; Julian Fierrez; Aythami Morales; Ruben Tolosana; Javier Ortega-Garcia

doi:10.1109/ACCESS.2024.3437291

IEEE Access (Jan 2024)

DeepFace-Attention: Multimodal Face Biometrics for Attention Estimation With Application to e-Learning

Roberto Daza,
Luis F. Gomez,
Julian Fierrez,
Aythami Morales,
Ruben Tolosana,
Javier Ortega-Garcia

Affiliations

Roberto Daza: ORCiD; Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain
Luis F. Gomez: Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain
Julian Fierrez: ORCiD; Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain
Aythami Morales: ORCiD; Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain
Ruben Tolosana: ORCiD; Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain
Javier Ortega-Garcia: ORCiD; Biometrics and Data Pattern Analytics Laboratory, Universidad Autonoma de Madrid, Campus de Cantoblanco, Madrid, Spain

DOI: https://doi.org/10.1109/ACCESS.2024.3437291
Journal volume & issue: Vol. 12
pp. 111343 – 111359

Abstract

Read online

This work introduces an innovative method for estimating attention levels (cognitive load) using an ensemble of facial analysis techniques applied to webcam videos. Our method is particularly useful, among others, in e-learning applications, so we trained, evaluated, and compared our approach on the mEBAL2 database, a public multi-modal database acquired in an e-learning environment. mEBAL2 comprises data from 60 users who performed 8 different tasks. These tasks varied in difficulty, leading to changes in their cognitive loads. Our approach adapts state-of-the-art facial analysis technologies to quantify the users’ cognitive load in the form of high or low attention. Several behavioral signals and physiological processes related to the cognitive load are used, such as eyeblink, heart rate, facial action units, and head pose, among others. Furthermore, we conduct a study to understand which individual features obtain better results, the most efficient combinations, explore local and global features, and how temporary time intervals affect attention level estimation, among other aspects. We find that global facial features are more appropriate for multimodal systems using score-level fusion, particularly as the temporal window increases. On the other hand, local features are more suitable for fusion through neural network training with score-level fusion approaches. Our method outperforms existing state-of-the-art accuracies using the public mEBAL2 benchmark.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords