IEEE Access (Jan 2023)
HSGA: A Hybrid LSTM-CNN Self-Guided Attention to Predict the Future Diagnosis From Discharge Narratives
Abstract
The prognosis of a patient’s re-admission and the forecast of future diagnoses is a critical task in the process of inferring clinical outcomes. The discharge summaries recorded in the Electronic Health Records (EHR) are stinking rich, but they are also heterogeneous, sparse, noisy, and biased, and hinder the learning algorithms that aim to extract actionable insights from them. The existing approaches use the current admission’s International Classification of Diseases (ICD) codes as the input features, but they do not fully describe the progression of the patient. Other systems apply the attention mechanisms directly to these notes without the guidance of domain knowledge, resulting in distorted predictions. In this work, we propose a hybrid LSTM-CNN self-guided attention model that aims to predict the ICD diagnosis that is likely to cause the next readmission within 90 days since the current discharge using the discharge narratives. Since the notes contain unnecessary tokens, the model leverages the recent advances in deep learning to predict the patient’s future diagnosis by reducing the number of tokens from the notes to be considered for prediction. We use a 1D CNN (1-Dimensional Convolutional Neural Network) to capture all features from the note and concurrently an LSTM (Long Short-Term Memory) is used to extract the features of clinically meaningful Concept Unique Identifiers (CUI) that are fetched from the note itself to build a knowledge base. The textual knowledge base guides the learning module about which n-grams from the note to focus on for prediction. We consider 3 prediction scenarios: diagnosis category prediction, the probability of the occurrence of one of the top 20 disease conditions, and ICD9 codes prediction. For the diagnosis category prediction, our proposed model achieves a macro-average ROC of 0.82 and a micro-average ROC of 0.79, an AUROC of 0.87 for the top 20 most appearing diseases prediction, and a macro-average Recall of 0.8 and a micro-average Recall of 0.84 for ICD9 codes prediction respectively. The predictive accuracy of the model is assessed through the prediction of heart failure onset and for all these prediction scenarios, the results show that the hybrid approach outperforms the existing baselines.
Keywords