Jisuanji kexue yu tansuo (Nov 2024)
Survey of Deep Learning Based Extractive Summarization
Abstract
Automatic text summarization (ATS) is a popular research direction in natural language processing, and its main implementation methods are divided into two categories: extractive and abstractive. Extractive summarization directly uses the text content in the source document, and compared with abstractive summarization, it has higher grammatical and factual correctness, and has broad prospects for extractive summarization in domains such as policy interpretation, official document summarization, legal and medicine industry, etc. In recent years, extractive summarization based on deep learning has received extensive attention. This paper mainly reviews the research progress of extractive summarization technology based on deep learning in recent years, and analyzes the relevant research work for the two key steps of extractive summarization: text unit encoding and summary extraction. Firstly, according to the different model frameworks, text unit encoding methods are divided into four categories: hierarchical sequential encoding, encoding based on graph neural networks, fusion encoding, and pre-training-based encoding. Then, according to the different granularity of summary extraction in the summary extraction stage, summary extraction methods are divided into two categories: text unit-level extraction and summary-level extraction. This paper also introduces commonly used public datasets and performance evaluation indicators for extractive summarization tasks. Finally, the future possible research directions and corresponding development trends in this field are predicted and summarized.
Keywords