Jisuanji kexue yu tansuo (May 2021)
SFExt-PGAbs: Two-Stage Summarization Model for Long Document
Abstract
Aiming at the fluency problem of extractive method, the accuracy problem of abstractive method, and the important information missing problem caused by truncating the original document before document encoding, this paper proposes a two-stage long document summarization model SFExt-PGAbs. It is composed of submodular function for extractive summarization SFExt and pointer generator for abstractive summarization PGAbs. SFExt-PGAbs simulates the human process of summarizing a long document. First, SFExt is used to extract important sentences from the long document and filter the unimportant and redundant sentences to form a transitional document. Then, PGAbs receives the transitional document as input to generate a fluent and accurate summary. In order to get a transitional document that is closer to the original document-centered idea, this paper expands the two sub-aspects of positional importance and accuracy in the traditional SFExt, and designs a new greedy algorithm at the same time. In order to study the effect of different feature extractors on the quality of the generated summary, two kinds of recurrent neural networks are applied in PGAbs. The experimental results show that on the CNNDM test set, SFExt-PGAbs generates a more fluent and more accurate summary compared with the baseline model, and the ROUGE indicators are significantly improved. At the same time, the expanded sub-aspects of SFExt can extract more accurate summary.
Keywords