Alexandria Engineering Journal (Mar 2025)
Real-time music emotion recognition based on multimodal fusion
Abstract
Multimodal emotion recognition is widely used in fields such as music emotion analysis and intelligent interaction. However, current models still face challenges in real-time and accuracy, especially in multimodal data fusion and emotional fluctuation processing. To this end, this paper proposes a real-time emotion recognition model based on Bi-LSTM and feature fusion, which effectively improves the capture efficiency of emotional features through multi-modal feature compression and adaptive sampling technology. The Bi-LSTM network is used to mine the time dependence of multi-modal data, while the feature fusion module integrates key emotional features in audio, visual and physiological signals, allowing the model to achieve a good balance between accuracy and real-time performance. Experimental results show that this model achieves higher accuracy and lower latency on the DEAP and AMIGOS data sets. Compared with existing methods, it has significant performance in multiple indicators such as weighted F1 score and G-Mean accuracy. To improve. Ablation experiments further confirmed that Bi-LSTM, feature fusion and adaptive sampling modules respectively make important contributions to the robustness and real-time performance of the model in emotion recognition tasks. This research provides an effective solution for multi-modal emotion recognition tasks, verifies the application potential of multi-modal feature fusion technology in music emotion analysis, and provides theoretical and practical support for the optimization of future emotional computing systems.