IEEE Access (Jan 2024)

Multi-Frame Transfer Learning Framework for Facial Emotion Recognition in e-Learning Contexts

  • Jamie Pordoy,
  • Haleem Farman,
  • Nevena Kostadinova Dicheva,
  • Aamir Anwar,
  • Moustafa M. Nasralla,
  • Nasrullah Khilji,
  • Ikram Ur Rehman

DOI
https://doi.org/10.1109/ACCESS.2024.3478072
Journal volume & issue
Vol. 12
pp. 151360 – 151381

Abstract

Read online

Advancements in online learning have created new opportunities for students to enhance academic progression and broaden educational accessibility. However, notable concerns have arisen regarding the reliability of Facial Emotion Recognition (FER) in e-Learning. The variability in facial expressions, such as micro-expressions and rapid transitions between emotions, poses a challenge for consistent emotion classification from single-frame images, with state-of-the-art models recording accuracy scores between 64% and 80% on the FER-2013 dataset. To mitigate these concerns, we curated a foundational Base dataset comprised of the emotions encountered in e-Learning contexts (Bored, Confused, Engaged, Frustrated, and Neutral). The Base dataset was then supplemented with video data recorded from 100 students, 50 from the United Kingdom (UK) and 50 from Saudi Arabia, participating in 10 e-Learning sessions. The sessions were segmented into single-frame images and annotated by class. We then partitioned the supplemented Base dataset into two datasets, representing the UK and Saudi Arabia, respectively. To address the inherent limitations observed in single-frame FER, this study proposes a Multi-Frame Transfer Learning (MFTL) framework aimed at improving emotion classification in e-Learning contexts. The proposed framework combines an augmented MobileNet-V1 architecture with a custom temporal algorithm to classify the Dominant Emotion (DE) from a fixed-length sequence of frames. Experimental results on single-frame images from datasets 1 and 2 recorded accuracy, $f_{1}$ -score, and Matthews Correlation Coefficient (MCC) scores of 0.8724, 0.8726, 0.8407, and 0.8998, 0.8991, 0.8752, respectively. We then conducted 15 experiments using representative subsets of sequential video excerpts, with a total duration of 30 minutes. The proposed framework’s multi-frame approach recorded an average accuracy calculated across both datasets of 0.8778, with datasets 1 and 2 achieving 0.8646 and 0.8929, respectively. These findings validate our approach, evidenced by high classification scores and the framework’s efficacy in handling micro-expressions and emotional outliers. In conclusion, the proposed framework represents a substantial advancement in FER, offering a robust and refined approach tailored for online education. As e-Learning becomes increasingly integral to education, the insights and methodologies presented here propose to enhance student engagement, learning outcomes, and the overall online educational experience.INDEX TERMS Facial emotion recognition, deep learning, e-learning, transfer learning.

Keywords