Jisuanji kexue yu tansuo (Feb 2025)

Dual-Layer Fusion Knowledge Reasoning with Enhanced Multi-modal Features

  • JING Boxiang, WANG Hairong, WANG Tong, YANG Zhenye

DOI
https://doi.org/10.3778/j.issn.1673-9418.2312065
Journal volume & issue
Vol. 19, no. 2
pp. 406 – 416

Abstract

Read online

Most of the existing multi-modal knowledge reasoning methods use splicing or attention to directly fuse the multi-modal features extracted from the pre-trained model, often ignoring the heterogeneity and interaction complexity between different modes. Therefore, a two-layer fusion knowledge inference method with multi-modal feature enhancement is proposed. The structural information embedding module uses adaptive graph attention mechanism to filter and aggregate key neighbor information to enhance the semantic representation of entity and relationship embedding. The multi-modal embedding infor-mation module uses different attention mechanisms to pay attention to the unique features of different modal data and the common features among the multi-modal data, and uses the complementary information of the common features to carry out modal interaction, so as to reduce the heterogeneity difference between modes. The multi-modal feature fusion module adopts a two-layer fusion strategy combining low-rank multi-modal feature fusion and decision fusion to realize the dynamic and complex interaction of multi-modal data between and within modes, and comprehensively considers the contribution degree of each mode in inference to obtain more comprehensive prediction results. To verify the effectiveness of the proposed method, experiments are carried out on the FB15K-237, DB15K and YAGO15K datasets, respectively. The results show that compared with the multi-modal reasoning method, MRR and Hits@1 have an average improvement of 3.6% and 2.2% respectively, and compared with the single-modal inference method, MRR and Hits@1 have an average improvement of 13.7% and 14.6%, respectively on the FB15K-237 dataset.

Keywords