IEEE Access (Jan 2024)

Federated Learning for Multimodal Sentiment Analysis: Advancing Global Models With an Enhanced LinkNet Architecture

  • P. Vasanthi,
  • V. Madhu Viswanatham

DOI
https://doi.org/10.1109/ACCESS.2024.3503290
Journal volume & issue
Vol. 12
pp. 175218 – 175239

Abstract

Read online

Analyzing sentiments using single-modal approaches, such as text or image analysis alone, frequently encounters significant limitations. These drawbacks include inadequate feature representation, an inability to capture the full complexity of emotional expressions, and challenges in handling diverse and noisy data types. This underscores the urgent need for a more comprehensive approach, capable of integrating multiple types of data to offer a richer and more nuanced understanding of sentiment. To address these challenges, Multimodal Sentiment Analysis (MSA) has emerged as a crucial advancement. In this article, the Enhanced LinkNet (EnLNet)-based Federated Learning (FL) approach is proposed for MSA. This approach utilizes an EnLNet model within an FL framework to manage and process multimodal data, including text, signals, and images. The EnLNet model in the FL framework is an advanced model by encompassing modified encoder and decoder block, interpolation blocks and a modified activation function. The approach is structured in three stages: the initialization stage, where global parameters are established and shared among clients; the local training stage, where several local models independently handle preprocessing, feature extraction, and fusion of text (with Improved Aspect Term Extraction (IATE) and Term Frequency-Inverse Document Frequency (TF-IDF)), signals (using Modified Mel Frequency Cepstral Coefficient (MMFCC) and spectral features), and images (through Improved Active Appearance Model (IAAM) and Median Binary Pattern (MBP)) before training the EnLNet model; and the model aggregation stage, where updates from local models are collected and aggregated by the central server to refine the global model. This iterative process lasts until convergence or the extreme number of iterations is attained. The efficiency of this approach is validated through Accuracy, Precision, FNR, FPR and performance comparisons with state-of-the-art approaches, demonstrating its capacity to enhance MSA by successfully incorporating and processing diverse multimodal data.

Keywords