Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation

Di Wu; Peng Cheng; Yuying Zheng

doi:10.3390/app14062435

Applied Sciences (Mar 2024)

Multistage Mixed-Attention Unsupervised Keyword Extraction for Summary Generation

Di Wu,
Peng Cheng,
Yuying Zheng

Affiliations

Di Wu: School of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, China
Peng Cheng: School of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, China
Yuying Zheng: School of Information and Electronic Engineering, Hebei University of Engineering, No. 19 Taiji Road, Handan 056000, China

DOI: https://doi.org/10.3390/app14062435
Journal volume & issue: Vol. 14, no. 6
p. 2435

Abstract

Read online

Summary generation is an important research direction in natural language processing. Aimed at the problems of redundant information processing difficulties and an inability to generate high-quality summaries from long text in existing summary generation models, BART is the backbone model, an N + 1 coarse–fine-grained multistage summary generation framework is constructed, and a multistage mixed-attention unsupervised keyword extraction summary generation model is proposed (multistage mixed-attention unsupervised keyword extraction for summary generation, MSMAUKE-SummN). In the N-coarse-grained summary generation stages, a sentence filtering layer (PureText) is constructed to remove redundant information in long text. A mixed-attention unsupervised approach is used to iteratively extract keywords, assisting summary inference and enriching the global semantic information of coarse-grained summaries. In the 1-fine-grained summary generation stage, a self-attentive keyword selection module (KeywordSelect) is designed to obtain keywords with higher weights and enhance the local semantic representation of fine-grained summaries. Tandem N-coarse-grained and 1-fine-grained summary generation stages are used to obtain long text summaries through a multistage generation approach. The experimental results show that the model improves the ROUGE-1, ROUGE-2, and ROUGE-L metrics by a minimum of 0.75%, 1.48%, and 1.25% over the HMNET, TextRank, HAT-BART, DDAMS, and SummN models on summarized datasets such as AMI, ICSI, and QMSum.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords