Human Emotion Recognition Based on Spatio-Temporal Facial Features Using HOG-HOF and VGG-LSTM

Hajar Chouhayebi; Mohamed Adnane Mahraz; Jamal Riffi; Hamid Tairi; Nawal Alioua

doi:10.3390/computers13040101

Computers (Apr 2024)

Human Emotion Recognition Based on Spatio-Temporal Facial Features Using HOG-HOF and VGG-LSTM

Hajar Chouhayebi,
Mohamed Adnane Mahraz,
Jamal Riffi,
Hamid Tairi,
Nawal Alioua

Affiliations

Hajar Chouhayebi: Laboratory of Computer Science, Signals, Automation and Cognitivism (LISAC), Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
Mohamed Adnane Mahraz: Laboratory of Computer Science, Signals, Automation and Cognitivism (LISAC), Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
Jamal Riffi: Laboratory of Computer Science, Signals, Automation and Cognitivism (LISAC), Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
Hamid Tairi: Laboratory of Computer Science, Signals, Automation and Cognitivism (LISAC), Department of Computer Science, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
Nawal Alioua: LMC, Polydisciplinary Faculty, Department of Mathematics and Computer Science, Cadi Ayyad University, Safi 46000, Morocco

DOI: https://doi.org/10.3390/computers13040101
Journal volume & issue: Vol. 13, no. 4
p. 101

Abstract

Read online

Human emotion recognition is crucial in various technological domains, reflecting our growing reliance on technology. Facial expressions play a vital role in conveying and preserving human emotions. While deep learning has been successful in recognizing emotions in video sequences, it struggles to effectively model spatio-temporal interactions and identify salient features, limiting its accuracy. This research paper proposed an innovative algorithm for facial expression recognition which combined a deep learning algorithm and dynamic texture methods. In the initial phase of this study, facial features were extracted using the Visual-Geometry-Group (VGG19) model and input into Long-Short-Term-Memory (LSTM) cells to capture spatio-temporal information. Additionally, the HOG-HOF descriptor was utilized to extract dynamic features from video sequences, capturing changes in facial appearance over time. Combining these models using the Multimodal-Compact-Bilinear (MCB) model resulted in an effective descriptor vector. This vector was then classified using a Support Vector Machine (SVM) classifier, chosen for its simpler interpretability compared to deep learning models. This choice facilitates better understanding of the decision-making process behind emotion classification. In the experimental phase, the fusion method outperformed existing state-of-the-art methods on the eNTERFACE05 database, with an improvement margin of approximately 1%. In summary, the proposed approach exhibited superior accuracy and robust detection capabilities.

Published in Computers

ISSN: 2073-431X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computers

About the journal

Abstract

Keywords