DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements

Chang Liu; Dongsheng Yang; Bihui Yu; Liping Bu

doi:10.1109/ACCESS.2024.3400250

IEEE Access (Jan 2024)

DGHC: A Hybrid Algorithm for Multi-Modal Named Entity Recognition Using Dynamic Gating and Correlation Coefficients With Visual Enhancements

Chang Liu,
Dongsheng Yang,
Bihui Yu,
Liping Bu

Affiliations

Chang Liu: ORCiD; Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, China
Dongsheng Yang: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, China
Bihui Yu: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, China
Liping Bu: Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang, China

DOI: https://doi.org/10.1109/ACCESS.2024.3400250
Journal volume & issue: Vol. 12
pp. 69151 – 69162

Abstract

Read online

Multimodal named entity recognition plays a crucial role in the construction process of knowledge graphs as it directly influences the quality of entity extraction and classification, which in turn affects the overall quality of knowledge graph construction. However, most existing multimodal named entity recognition algorithms do not consider the correlation between text and images. They either use visual features of images as the attention of the text modality or fuse them with textual features. In the case of multimodal tweets containing both text and images, three categories of data can be identified based on the correlation between the two: text that is related to images, text that is partially related to images, and text that is not related to images. Using irrelevant or partially relevant image features as text cross-modal attention can result in incorrect text representation, ultimately leading to misclassification of entities and negatively impacting the model’s performance. To address the problem of uncertainty or negative impact caused by the lack of relevance or partial correlation between text and images, this paper proposes a visually enhanced text representation algorithm based on a hybrid of dynamic gating and correlation coefficient. We conducted experiments on two benchmark datasets, namely Twitter-2015 and Twitter-2017. The experimental results were analyzed comprehensively to showcase the strengths of the proposed model.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords