Applied Sciences (Aug 2024)

Multimodal Sentiment Classifier Framework for Different Scene Contexts

  • Nelson Silva,
  • Pedro J. S. Cardoso,
  • João M. F. Rodrigues

DOI
https://doi.org/10.3390/app14167065
Journal volume & issue
Vol. 14, no. 16
p. 7065

Abstract

Read online

Sentiment analysis (SA) is an effective method for determining public opinion. Social media posts have been the subject of much research, due to the platforms’ enormous and diversified user bases that regularly share thoughts on nearly any subject. However, on posts composed by a text–image pair, the written description may or may not convey the same sentiment as the image. The present study uses machine learning models for the automatic sentiment evaluation of pairs of text and image(s). The sentiments derived from the image and text are evaluated independently and merged (or not) to form the overall sentiment, returning the sentiment of the post and the discrepancy between the sentiments represented by the text–image pair. The image sentiment classification is divided into four categories—“indoor” (IND), “man-made outdoors” (OMM), “non-man-made outdoors” (ONMM), and “indoor/outdoor with persons in the background” (IOwPB)—and then ensembled into an image sentiment classification model (ISC), that can be compared with a holistic image sentiment classifier (HISC), showing that the ISC achieves better results than the HISC. For the Flickr sub-data set, the sentiment classification of images achieved an accuracy of 68.50% for IND, 83.20% for OMM, 84.50% for ONMM, 84.80% for IOwPB, and 76.45% for ISC, compared to 65.97% for the HISC. For the text sentiment classification, in a sub-data set of B-T4SA, an accuracy of 92.10% was achieved. Finally, the text–image combination, in the authors’ private data set, achieved an accuracy of 78.84%.

Keywords