Text-image semantic relevance identification for aspect-based multimodal sentiment analysis

Tianzhi Zhang; Gang Zhou; Jicang Lu; Zhibo Li; Hao Wu; Shuo Liu

doi:10.7717/peerj-cs.1904

PeerJ Computer Science (Apr 2024)

Text-image semantic relevance identification for aspect-based multimodal sentiment analysis

Tianzhi Zhang,
Gang Zhou,
Jicang Lu,
Zhibo Li,
Hao Wu,
Shuo Liu

Affiliations

Tianzhi Zhang: Information Engineering University, Zhengzhou, Henan, China
Gang Zhou: Information Engineering University, Zhengzhou, Henan, China
Jicang Lu: Information Engineering University, Zhengzhou, Henan, China
Zhibo Li: Information Engineering University, Zhengzhou, Henan, China
Hao Wu: Information Engineering University, Zhengzhou, Henan, China
Shuo Liu: Information Engineering University, Zhengzhou, Henan, China

DOI: https://doi.org/10.7717/peerj-cs.1904
Journal volume & issue: Vol. 10
p. e1904

Abstract

Read online Read online

Aspect-based multimodal sentiment analysis (ABMSA) is an emerging task in the research of multimodal sentiment analysis, which aims to identify the sentiment of each aspect mentioned in multimodal sample. Although recent research on ABMSA has achieved some success, most existing models only adopt attention mechanism to interact aspect with text and image respectively and obtain sentiment output through multimodal concatenation, they often neglect to consider that some samples may not have semantic relevance between text and image. In this article, we propose a Text-Image Semantic Relevance Identification (TISRI) model for ABMSA to address the problem. Specifically, we introduce a multimodal feature relevance identification module to calculate the semantic similarity between text and image, and then construct an image gate to dynamically control the input image information. On this basis, an image auxiliary information is provided to enhance the semantic expression ability of visual feature representation to generate more intuitive image representation. Furthermore, we employ attention mechanism during multimodal feature fusion to obtain the text-aware image representation through text-image interaction to prevent irrelevant image information interfering our model. Experiments demonstrate that TISRI achieves competitive results on two ABMSA Twitter datasets, and then validate the effectiveness of our methods.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords