SFExt-PGAbs: Two-Stage Summarization Model for Long Document

ZHOU Weixiao, LAN Wenfei, XU Zhiming, ZHU Rongbo

doi:10.3778/j.issn.1673-9418.2006002

Jisuanji kexue yu tansuo (May 2021)

SFExt-PGAbs: Two-Stage Summarization Model for Long Document

ZHOU Weixiao, LAN Wenfei, XU Zhiming, ZHU Rongbo

Affiliations

ZHOU Weixiao, LAN Wenfei, XU Zhiming, ZHU Rongbo: 1. School of Computer Science, South-Central University for Nationalities, Wuhan 430074, China 2. School of Mechanical Engineering and Automation, Fuzhou University, Fuzhou 350108, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2006002
Journal volume & issue: Vol. 15, no. 5
pp. 907 – 921

Abstract

Read online

Aiming at the fluency problem of extractive method, the accuracy problem of abstractive method, and the important information missing problem caused by truncating the original document before document encoding, this paper proposes a two-stage long document summarization model SFExt-PGAbs. It is composed of submodular function for extractive summarization SFExt and pointer generator for abstractive summarization PGAbs. SFExt-PGAbs simulates the human process of summarizing a long document. First, SFExt is used to extract important sentences from the long document and filter the unimportant and redundant sentences to form a transitional document. Then, PGAbs receives the transitional document as input to generate a fluent and accurate summary. In order to get a transitional document that is closer to the original document-centered idea, this paper expands the two sub-aspects of positional importance and accuracy in the traditional SFExt, and designs a new greedy algorithm at the same time. In order to study the effect of different feature extractors on the quality of the generated summary, two kinds of recurrent neural networks are applied in PGAbs. The experimental results show that on the CNNDM test set, SFExt-PGAbs generates a more fluent and more accurate summary compared with the baseline model, and the ROUGE indicators are significantly improved. At the same time, the expanded sub-aspects of SFExt can extract more accurate summary.

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords