IEEE Access (Jan 2024)

Integrating Topic-Aware Heterogeneous Graph Neural Network With Transformer Model for Medical Scientific Document Abstractive Summarization

  • Ayesha Khaliq,
  • Atif Khan,
  • Salman Afsar Awan,
  • Salman Jan,
  • Muhammad Umair,
  • Megat F. Zuhairi

DOI
https://doi.org/10.1109/ACCESS.2024.3443730
Journal volume & issue
Vol. 12
pp. 113855 – 113866

Abstract

Read online

The development of abstractive summarization methods is a crucial task in Natural Language Processing (NLP) that presents challenges, which require the creation of intelligent systems that are capable of extracting the main idea from text effectively and generate coherent summary. Numerous existing abstractive approaches do not take into account the importance of the broader context or fail to capture the global semantics in identifying salient content for summary. Moreover, there is lack of studies that extensively evaluated abstractive summarization models for specific domains, such as medical scientific document summarization. With this motivation behind, this paper developed an integrated framework for abstractive summarization of medical scientific documents that integrates topic-aware Heterogeneous Graph Neural Network with a Transformer model. The suggested framework uses Latent Dirichlet Allocation (LDA) for topic modeling to uncover latent topics and global information, thus preserving document-level attributes important for creation of effective summaries. In addition to topic modeling, the framework utilized a Heterogeneous Graph Neural Network (HGNN), capable of capturing the relationship between sentences through graph-based document representation, and allows for the concurrent updating of both local and global information. Finally, the framework is integrated with a Transformer decoder, which greatly enhances the ability of model to produce accurate and informative abstractive summaries. The performance of proposed framework is evaluated on publicly available PubMed dataset related to medical scientific papers. Experimental results illustrate that the suggested framework for abstractive summarization showed superior performance as compared to the state-of-the-art models, achieving high F1-Scores: 46.03 for Rouge-1, 21.42 for Rouge-2, and 39.71 for Rouge-L. Our research makes a significant contribution to the field of natural language processing, particularly in the area of medical scientific document summarization. It demonstrates superior performance and provides a deeper understanding of document structure, and has the potential to impact various applications by offering efficient access to information.

Keywords