Alexandria Engineering Journal (Apr 2025)

Attention-enhanced optimized deep ensemble network for effective facial emotion recognition

  • Taimoor Khan,
  • Muhammad Yasir,
  • Chang Choi

Journal volume & issue
Vol. 119
pp. 111 – 123

Abstract

Read online

Facial emotion recognition (FER) is rapidly advancing, with significant applications in healthcare, human-computer interactions, and biometric security, driven by recent advances in artificial intelligence (AI), computer vision, and deep learning. In recent studies, various algorithms have encountered several challenges such as data scarcity, low-resolution images, and suboptimal performance, particularly in real-time applications. Addressing these issues is essential for improving model performance and FER efficiency in practical settings. Therefore, in this study, we propose an innovative framework called an attention-based ensemble network (EA-Net) for accurate FER. The proposed framework involves two distinct phases: data preprocessing and model training. The preprocessing phase includes various data augmentation and super-resolution techniques, designed to increase the data quantity and quality, respectively. These preprocessing techniques significantly enhance model performance by expanding the dataset size and refining its quality, thereby improving FER performance. In the second phase, an ensemble-based technique is implemented that simultaneously uses a parallel connection of EfficientNetB0 and InceptionV3 as backbones for feature extraction. Subsequently, the channel attention module (CAM) and spatial attention module (SAM) are sequentially incorporated in the framework for dominant feature selection. Finally, we integrated fully connected (FC) layers to accurately classify facial emotions (anger, disgust, fear, happy, neutral, sad, and surprise). For fair evaluation, we conducted extensive experiments and compared the performance of the proposed EA-Net against several state-of-the-art (SOTA) techniques over FER and Karolinska Directed Emotional Faces (KDEF) datasets. Where, the proposed framework exhibits optimal performance in terms of precision (76.10 %), recall (77.98 %), F1-score (77.98 %), and accuracy (78.60 %) over FER data. Moreover, it offers higher results on KDEF data, with precision at 99.61 %, recall at 98.65 %, F1-score at 99.13 %, and accuracy at 99.30 %. Overall, the comprehensive experimental analysis demonstrates that the proposed network is a robust solution for biometric security, healthcare, and surveillance applications. EA-Net can accurately recognize facial emotions and exhibits strong generalization capabilities even in resource-constrained environments. The source code is publicly available at https://github.com/TaimoorKhan561/Facial-Emotion-Recognition

Keywords