Cross-modal knowledge guided model for abstractive summarization

Hong Wang; Jin Liu; Mingyang Duan; Peizhu Gong; Zhongdai Wu; Junxiang Wang; Bing Han

doi:10.1007/s40747-023-01170-9

Complex & Intelligent Systems (Jul 2023)

Cross-modal knowledge guided model for abstractive summarization

Hong Wang,
Jin Liu,
Mingyang Duan,
Peizhu Gong,
Zhongdai Wu,
Junxiang Wang,
Bing Han

Affiliations

Hong Wang: College of Information Engineering, Shanghai Maritime University
Jin Liu: College of Information Engineering, Shanghai Maritime University
Mingyang Duan: College of Information Engineering, Shanghai Maritime University
Peizhu Gong: College of Information Engineering, Shanghai Maritime University
Zhongdai Wu: COSCO Shipping Technology Co., Ltd
Junxiang Wang: COSCO Shipping Technology Co., Ltd
Bing Han: Shanghai Ship and Shipping Research Institute

DOI: https://doi.org/10.1007/s40747-023-01170-9
Journal volume & issue: Vol. 10, no. 1
pp. 577 – 594

Abstract

Read online

Abstract Abstractive summarization (AS) aims to generate more flexible and informative descriptions than extractive summarization. Nevertheless, it often distorts or fabricates facts in the original article. To address this problem, some existing approaches attempt to evaluate or verify factual consistency, or design models to reduce factual errors. However, most of the efforts either have limited effects or result in lower rouge scores while reducing factual errors. In other words, it is challenging to promote factual consistency while maintaining the informativeness of generated summaries. Inspired by the knowledge graph embedding technique, in this paper, we propose a novel cross-modal knowledge guided model (CKGM) for AS, which embeds a multimodal knowledge graph (MKG) combining image entity-relationship information and textual factual information (FI) into BERT to accomplish cross-modal information interaction and knowledge expansion. The pre-training method obtains rich contextual semantic information, while the knowledge graph supplements the textual information. In addition, an entity memory embedding algorithm is further proposed to improve information fusion efficiency and model training speed. We elaborately conducted ablation experiments and evaluated our model on the Visual Genome, FewRel, MSCOCO, and CNN/DailyMail datasets. Experimental results demonstrate that our model can significantly improve the FI consistency and informativeness of generated summaries.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal