Alexandria Engineering Journal (Dec 2024)
Enhancing emotion prediction using deep learning and distributed federated systems with SMOTE oversampling technique
Abstract
Facial Expression Recognition (FER) categorizes various human emotions by analyzing the features of the face, so it plays a vital role in recognizing emotions. Prior studies have focused on the issue of recognizing emotions through voices or speech. Addressing the existing method issues, this approach aims to detect voices and three-dimensional images using appropriate datasets and novel deep-learning techniques. In this research, the valid Audio-Visual datasets Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Acted Facial Expression in the Wild (AFEW), and eNTERFACE’05 datasets are chosen for analysis. RAVDESS dataset contains audio, AFEW, and eNTERFACE and has three-dimensional images of humans, i.e., 3D images. SMOTE technique is presented for solving overfitting problems to balance the dataset by oversampling and under-sampling process. The research employs the Federated 3D-CNN technique to predict the accurate emotions of humans. The 3D Convolutional Neural Network (3DCNN) predicts accurate information of a person at any angle in image processing. Mel Frequency Cepstrum Coefficient (MFCC) is used to extract and fine-tune the voices. A significant contribution of Federated Learning with 3D-Convolutional Neural Network is executed for multiple clients at a time through global and local updates of weights. The proposed framework achieves a prediction accuracy of 95.72 % when compared with existing methods. This approach helps in many applications, such as analyzing emotions, healthcare, etc.