Applied Sciences (Feb 2024)
TOMDS (Topic-Oriented Multi-Document Summarization): Enabling Personalized Customization of Multi-Document Summaries
Abstract
In a multi-document summarization task, if the user can decide on the summary topic, the generated summary can better align with the reader’s specific needs and preferences. This paper addresses the issue of overly general content generation by common multi-document summarization models and proposes a topic-oriented multi-document summarization (TOMDS) approach. The method is divided into two stages: extraction and abstraction. During the extractive stage, it primarily identifies and retrieves paragraphs relevant to the designated topic, subsequently sorting them based on their relevance to the topic and forming an initial subset of documents. In the abstractive stage, building upon the transformer architecture, the process includes two parts: encoding and decoding. In the encoding part, we integrated an external discourse parsing module that focuses on both micro-level within-paragraph semantic relationships and macro-level inter-paragraph connections, effectively combining these with the implicit relationships in the source document to produce more enriched semantic features. In the decoding part, we incorporated a topic-aware attention mechanism that dynamically zeroes in on information pertinent to the chosen topic, thus guiding the summary generation process more effectively. The proposed model was primarily evaluated using the standard text summary dataset WikiSum. The experimental results show that our model significantly enhanced the thematic relevance and flexibility of the summaries and improved the accuracy of grammatical and semantic comprehension in the generated summaries.
Keywords