Alexandria Engineering Journal (Feb 2025)

IoT-based approach to multimodal music emotion recognition

  • Hanbing Zhao,
  • Ling Jin

Journal volume & issue
Vol. 113
pp. 19 – 31

Abstract

Read online

With the rapid development of Internet of Things (IoT) technology, multimodal emotion recognition has gradually become an important research direction in the field of artificial intelligence. However, existing methods often face challenges in efficiency and accuracy when processing multimodal data. This study aims to propose an IoT-supported multimodal music emotion recognition model that integrates audio and video signals to achieve real-time emotion recognition and classification. The proposed CGF-Net model combines a 3D Convolutional Neural Network (3D-CNN), Gated Recurrent Unit (GRU), and Fully Connected Network (FCN). By effectively fusing multimodal data, the model enhances the accuracy and efficiency of music emotion recognition. Extensive experiments were conducted on two public datasets, DEAM and DEAP, and the results demonstrate that CGF-Net performs exceptionally well in various emotion recognition tasks, particularly achieving high accuracy and F1 scores in recognizing positive emotions such as ”Happy” and ”Relax.” Compared to other benchmark models, CGF-Net shows significant advantages in both accuracy and stability. This study presents an effective solution for multimodal emotion recognition, demonstrating its broad potential in applications such as intelligent emotional interaction and music recommendation systems.

Keywords