Dual-Layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features

JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye

doi:10.3778/j.issn.1673-9418.2312065

Jisuanji kexue yu tansuo (Feb 2025)

Dual-Layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features

JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye

Affiliations

JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye: 1. School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China 2. The Key Laboratory of Images & Graphics Intelligent Processing of State Ethnic Affairs Commission, North Minzu University, Yinchuan 750021, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2312065
Journal volume & issue: Vol. 19, no. 2
pp. 406 – 416

Abstract

Read online

Most of the existing multi-modal knowledge reasoning methods use splicing or attention to directly fuse the multi-modal features extracted from the pre-trained model, often ignoring the heterogeneity and interaction complexity between different modes. Therefore, a two-layer fusion knowledge inference method with multi-modal feature enhancement is proposed. The structural information embedding module uses adaptive graph attention mechanism to filter and aggregate key neighbor information to enhance the semantic representation of entity and relationship embedding. The multi-modal embedding infor-mation module uses different attention mechanisms to pay attention to the unique features of different modal data and the common features among the multi-modal data, and uses the complementary information of the common features to carry out modal interaction, so as to reduce the heterogeneity difference between modes. The multi-modal feature fusion module adopts a two-layer fusion strategy combining low-rank multi-modal feature fusion and decision fusion to realize the dynamic and complex interaction of multi-modal data between and within modes, and comprehensively considers the contribution degree of each mode in inference to obtain more comprehensive prediction results. To verify the effectiveness of the proposed method, experiments are carried out on the FB15K-237, DB15K and YAGO15K datasets, respectively. The results show that compared with the multi-modal reasoning method, MRR and Hits@1 have an average improvement of 3.6% and 2.2% respectively, and compared with the single-modal inference method, MRR and Hits@1 have an average improvement of 13.7% and 14.6%, respectively on the FB15K-237 dataset.

multi-modal knowledge graph; link prediction; knowledge reasoning; multi-modal feature fusion

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords