PeerJ Computer Science (Nov 2024)

Unified extractive-abstractive summarization: a hybrid approach utilizing BERT and transformer models for enhanced document summarization

  • Divya S.,
  • Sripriya N.,
  • J. Andrew,
  • Manuel Mazzara

DOI
https://doi.org/10.7717/peerj-cs.2424
Journal volume & issue
Vol. 10
p. e2424

Abstract

Read online Read online

With the exponential proliferation of digital documents, there arises a pressing need for automated document summarization (ADS). Summarization, a compression technique, condenses a source document into concise sentences that encapsulate its salient information for summary generation. A primary challenge lies in crafting a dependable summary, contingent upon both extracted features and human-established parameters. This article introduces an intelligent methodology that seamlessly integrates extractive and abstractive techniques to ensure heightened relevance between the input document and its summary. Initially, input sentences undergo transformation into representations utilizing BERT, subsequently transposed into a symmetric matrix based on their similarity. Semantically congruent sentences are then extracted from this matrix to construct an extractive summary. The transformer model integrates an objective function highly symmetric and invariant under unitary transformation for language generation. This model refines the extracted informative sentences and generates an abstractive summary akin to manually crafted summaries. Employing this hybrid summarization technique on the CNN/DailyMail dataset and DUC2004, we evaluate its efficacy using ROUGE metrics. Results demonstrate the superiority of our proposed technique over conventional summarization methods.

Keywords