A study of extractive summarization of long documents incorporating local topic and hierarchical information

Ting Wang; Chuan Yang; Maoyang Zou; Jiaying Liang; Dong Xiang; Wenjie Yang; Hongyang Wang; Jia Li

doi:10.1038/s41598-024-60779-z

Scientific Reports (May 2024)

A study of extractive summarization of long documents incorporating local topic and hierarchical information

Ting Wang,
Chuan Yang,
Maoyang Zou,
Jiaying Liang,
Dong Xiang,
Wenjie Yang,
Hongyang Wang,
Jia Li

Affiliations

Ting Wang: School of Computer Science, Chengdu University of Information Technology
Chuan Yang: School of Computer Science, Chengdu University of Information Technology
Maoyang Zou: College of Blockchain Industry, Chengdu University of Information Technology
Jiaying Liang: School of Computer Science, Chengdu University of Information Technology
Dong Xiang: School of Computer Science, Chengdu University of Information Technology
Wenjie Yang: School of Computer Science, Chengdu University of Information Technology
Hongyang Wang: School of Computer Science, Chengdu University of Information Technology
Jia Li: School of Computer Science, Chengdu University of Information Technology

DOI: https://doi.org/10.1038/s41598-024-60779-z
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 13

Abstract

Read online

Abstract In recent years, the transformer-based language models have achieved remarkable success in the field of extractive text summarization. However, there are still some limitations in this kind of research. First, the transformer language model usually regards the text as a linear sequence, ignoring the inherent hierarchical structure information of the text. Second, for long text data, traditional extractive models often focus on global topic information, which poses challenges in how they capturing and integrating local contextual information within topic segments. To address these issues, we propose a long text extractive summarization model that employs a local topic information extraction module and a text hierarchical extraction module to capture the local topic information and document's hierarchical structure information of the original text. Our approach enhances the ability to determine whether a sentence belongs to the summary. In this experiment, ROUGE score is used as the experimental evaluation index, and evaluates the model on three large public datasets. Through experimental validation, the model demonstrates superior performance in terms of ROUGE-1, ROUGE-2, and ROUGE-L scores compared to current mainstream summarization models, affirming the effectiveness of incorporating local topic information and document hierarchical structure into the model.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal