IEEE Access (Jan 2024)
SentDep: Pioneering Fusion-Centric Multimodal Sentiment Analysis for Unprecedented Performance and Insights
Abstract
Multimodal sentiment analysis (MSA) is an emerging field focused on interpreting complex human emotions and expressions by integrating various data types, including text, audio, and visuals. Addressing the challenges in this area, we introduce SentDep, a groundbreaking framework that merges cutting-edge fusion methods with modern deep learning structures. Designed to effectively blend the unique features of textual, acoustic, and visual data, SentDep offers a unified and potent representation of multimodal data. Our extensive tests on renowned datasets like CMU-MOSI and CMU-MOSEI demonstrate that SentDep surpasses current leading models, setting a new standard in MSA performance. We conducted thorough ablation studies and supplementary experiments to identify what drives SentDep’s success. These studies highlight the importance of the size of pre-training data, the effectiveness of various fusion techniques, and the critical role of temporal information in enhancing the model’s capabilities.
Keywords