PeerJ Computer Science (Feb 2024)

Heterogeneous text graph for comprehensive multilingual sentiment analysis: capturing short- and long-distance semantics

  • El Mahdi Mercha,
  • Houda Benbrahim,
  • Mohammed Erradi

DOI
https://doi.org/10.7717/peerj-cs.1876
Journal volume & issue
Vol. 10
p. e1876

Abstract

Read online Read online

Multilingual sentiment analysis (MSA) involves the task of comprehending people’s opinions, sentiments, and emotions in multilingual written texts. This task has garnered considerable attention due to its importance in extracting insights for decision-making across diverse fields such as marketing, finance, and politics. Several studies have explored MSA using deep learning methods. Nonetheless, a majority of these studies depend on sequential-based approaches, which focus on capturing short-distance semantics within adjacent word sequences, but they overlook long-distance semantics, which can provide more profound insights for analysis. In this work, we propose an approach for multilingual sentiment analysis, namely MSA-GCN, leveraging a graph convolutional network to effectively capture both short- and long-distance semantics. MSA-GCN involves the comprehensive modeling of the multilingual sentiment analysis corpus through a unified heterogeneous text graph. Subsequently, a slightly deep graph convolutional network is employed to acquire predictive representations for all nodes by encouraging the transfer learning across languages. Extensive experiments are carried out on various language combinations using different benchmark datasets to assess the efficiency of the proposed approach. These datasets include Multilingual Amazon Reviews Corpus (MARC), Internet Movie Database (IMDB), Allociné, and Muchocine. The achieved results reveal that MSA-GCN significantly outperformed all baseline models in almost all datasets with a p-value < 0.05 based on student t-test. In addition, such approach shows prominent results in a variety of language combinations, revealing the robustness of the approach against language variation.

Keywords