Applied Sciences (May 2023)

Visual Description Augmented Integration Network for Multimodal Entity and Relation Extraction

  • Min Zuo,
  • Yingjun Wang,
  • Wei Dong,
  • Qingchuan Zhang,
  • Yuanyuan Cai,
  • Jianlei Kong

DOI
https://doi.org/10.3390/app13106178
Journal volume & issue
Vol. 13, no. 10
p. 6178

Abstract

Read online

Multimodal Named Entity Recognition (MNER) and multimodal Relationship Extraction (MRE) play an important role in processing multimodal data and understanding entity relationships across textual and visual domains. However, irrelevant image information may introduce noise that misleads the recognition of information. Additionally, visual and semantic features originate from different modalities, and modal disparity hinders semantic alignment. Therefore, this paper proposes the Visual Description Augmentation Integration Network (VDAIN), which introduces an image description generation technique that allows semantic features generated from image descriptions to be presented in the same modality as the semantic features of textual information. This not only reduces the modal gap but also captures more accurately the high-level semantic information and underlying visual structure in the images. To filter out the modal noise, we use VDAIN to adaptively fuse visual features, semantic features of image descriptions, and textual information, thus eliminating irrelevant modal noise. The F1 score of the proposed model in this paper reaches 75.8% and 87.78% for the MNER task and 82.54% for the MRE task on the three public data sets, respectively, which are significantly better than the baseline model. The experimental results demonstrate the effectiveness of the proposed method in solving the modal noise and modal gap problems.

Keywords