Measurement: Sensors (Apr 2023)

Empirical analysis of multiple modalities for emotion recognition using convolutional neural network

  • Ram Avtar Jaswal,
  • Sunil Dhingra

Journal volume & issue
Vol. 26
p. 100716

Abstract

Read online

Emotion recognition is a dynamic process that focuses on a person's emotional state, which implies that the emotions associated with each individual's activities differ. Humans, in general, express their emotions in a variety of ways. It is critical to accurately evaluate these emotions in order to promote effective communication. However, emotional detection in our daily lives is vital for social engagement because emotions influence human behavior. In this research, we use a variety of machine learning methods for sensing and analysing EEG and audio signals in an effort to establish emotions as a multimodal object. The analysis depicts a multi-sensory emotion identification system based on the integration of multi-channel information from speech and EEG signals. The suggested system's purpose is to increase the accuracy of emotion identification during human-computer interaction. The combination of EEG and the speech-based emotion recognition method can be utilised to distinguish and perceive various conditions of feelings more precisely. A challenging issue in today's world is how to give a machine emotional intelligence. However, when studying human emotions, researchers do not combine many methods into a single framework. As emotion impacts nearly every modality, humans can use this information to predict emotional states in human behavior. A brain-computer interface (BCI) can be used to implement a model of the brain in conjunction with a model of voice signals to normalise emotional responses. To improve the accuracy of emotion detection, this processing method has to be investigated as a fusion model. In this paper, we accept this challenge and analyse multiple modalities for emotion recognition using a convolutional neural network by taking input signals from EEG and audio. Furthermore, both signal features were extracted and fused using the principal component analysis method and the Grey Wolf optimization algorithm, which were applied to select combined features. Moreover, the convolutional neural network is to be trained, and evaluation is done in terms of accuracy. The result shows that the proposed technique provides better accuracy (94.44%) in comparison to the existing one.

Keywords