Natural Language Processing Journal (Sep 2024)

Summarizing long scientific documents through hierarchical structure extraction

  • Grishma Sharma,
  • Deepak Sharma,
  • M. Sasikumar

Journal volume & issue
Vol. 8
p. 100080

Abstract

Read online

In the realm of academia, staying updated with the latest advancements has become increasingly difficult due to the rapid rise in scientific publications. Text summarization emerges as a solution to this challenge by distilling essential contributions into concise summaries. Despite the structured nature of scientific documents, current summarization techniques often overlook this valuable structural information. Our proposed method addresses this gap through an unsupervised, extractive, user preference-based, and hierarchical iterative graph-based ranking algorithm for summarizing long scientific documents. Unlike existing approaches, our method operates by leveraging the inherent structural information within scientific texts to generate diverse summaries tailored to user preferences. To assess the efficiency of our approach, we conducted evaluations on two distinct long document datasets: ScisummNet and a custom dataset comprising papers from esteemed journals and conferences with human-extracted sentences as gold summaries. The results obtained using automatic evaluation metric Rouge scores as well as human evaluation, demonstrate that our method performs better than other well-known unsupervised algorithms. This emphasizes the need for structural information in text summarization, enabling more effective and customizable solutions.

Keywords