IEEE Access (Jan 2024)

Predicting Tumor Type and Residual Status of Suprasellar Lesions Using Indian Discharge Summaries

  • Priyanka C. Nair,
  • Deepa Gupta,
  • Bhagavatula Indira Devi,
  • Vani Kanjirangat,
  • P. Deepak

DOI
https://doi.org/10.1109/ACCESS.2024.3460976
Journal volume & issue
Vol. 12
pp. 134379 – 134410

Abstract

Read online

A suprasellar lesion is an unusual mass in the suprasellar region in the brain. Some common suprasellar lesions include Pituitary Adenoma, Craniopharyngioma and Meningioma. Patients may present with significant visual and other symptoms like headache, and hormonal imbalances. The proposed study utilizes 553 discharge summaries of suprasellar patients admitted during 2013–2019 at NIMHANS hospitals, Bangalore. Classification of discharge summary was conducted using 11 different word embedding techniques, including word2vec, FastText, Glove, and transformer-based embeddings. Tumor type is predicted using advanced ML classifiers like AdaBoost, Random Forest, and XGBoost. The highest F-score of 0.91 was reported for XGBoost when implemented along with SMOTE based data balancing and PCA based feature reduction. To enhance the classification performance of the best performing model, ClinicalBioBERT, a pre-trained BERT model that demonstrated superior results, was finetuned with domain-specific clinical data and resulted in an improvement of the F-score to 0.93. Classification of presence/absence of residual tumor post surgery is also carried out using transformer models and achieved a macro F1-score of maximum 1, after handling the class imbalance using SMOTE. Different combinations of experiments with PCA and SMOTE were carried out in both classification problems. Two Large Language Models: FlanT5 and Bloom, are also investigated in this work for both classification problems Initially, the LLM is employed with a zero-shot classification pipeline, resulting in poor performance. Consequently, fine-tuning of the LLM models are attempted using the discharge summary text, resulting in performance improvements.

Keywords