IEEE Access (Jan 2021)

Greedy Optimization Method for Extractive Summarization of Scientific Articles

  • Iskander Akhmetov,
  • Alexander Gelbukh,
  • Rustam Mussabayev

DOI
https://doi.org/10.1109/ACCESS.2021.3136302
Journal volume & issue
Vol. 9
pp. 168141 – 168153

Abstract

Read online

This work presents a method for summarizing scientific articles from the arXive and PubMed datasets using a greedy Extractive Summarization algorithm. We used the approach along with Variable Neighborhood Search (VNS) to learn what is the top-line exists in the area of Extractive Text Summarization quality in terms of ROUGE scores. The algorithm is based on first selecting for the summary the sentences from the text containing the maximum number of words with the higher TFIDF values along with minimum document frequency parameter tuning for TFIDF vectorization. As a result, the method achieves 0.43/0.12 and 0.40/0.13 for ROUGE-1/ROUGE-2 scores on arXive and PubMed datasets, respectively. These results are comparable to the state-of-the-art models using complex neural network architectures and serious computational resources together with the large amounts of training data. In contrast, our method uses a straightforward statistical inference methodology.

Keywords