Applied Sciences (Mar 2024)

Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation

  • Di Wu,
  • Peng Cheng,
  • Yuying Zheng

DOI
https://doi.org/10.3390/app14062435
Journal volume & issue
Vol. 14, no. 6
p. 2435

Abstract

Read online

Summary generation is an important research direction in natural language processing. Aimed at the problems of redundant information processing difficulties and an inability to generate high-quality summaries from long text in existing summary generation models, BART is the backbone model, an N + 1 coarse–fine-grained multistage summary generation framework is constructed, and a multistage mixed-attention unsupervised keyword extraction summary generation model is proposed (multistage mixed-attention unsupervised keyword extraction for summary generation, MSMAUKE-SummN). In the N-coarse-grained summary generation stages, a sentence filtering layer (PureText) is constructed to remove redundant information in long text. A mixed-attention unsupervised approach is used to iteratively extract keywords, assisting summary inference and enriching the global semantic information of coarse-grained summaries. In the 1-fine-grained summary generation stage, a self-attentive keyword selection module (KeywordSelect) is designed to obtain keywords with higher weights and enhance the local semantic representation of fine-grained summaries. Tandem N-coarse-grained and 1-fine-grained summary generation stages are used to obtain long text summaries through a multistage generation approach. The experimental results show that the model improves the ROUGE-1, ROUGE-2, and ROUGE-L metrics by a minimum of 0.75%, 1.48%, and 1.25% over the HMNET, TextRank, HAT-BART, DDAMS, and SummN models on summarized datasets such as AMI, ICSI, and QMSum.

Keywords