BMC Pediatrics (Nov 2024)
An explainable deep learning model to predict partial anomalous pulmonary venous connection for patients with atrial septal defect
Abstract
Abstract Background Patients with partial anomalous pulmonary venous connection (PAPVC) usually present asymptomatic and accompanied by intricate anatomical types, which results in missed diagnosis from atrial septal defect (ASD). The present study aimed to explore the predictive variables of PAPVC from patients with ASD and constructed an explainable prediction model based on deep learning. Methods The retrospective study included 834 inpatients with ASD in Women and Children's Hospital, Qingdao University from January 2018 to January 2023. They were separated into two groups based on the presence of PAPVC. Propensity score matching and SMOTE were used to balance the baseline data between groups. The differential variables between the two groups were determined by univariate logistic regression. The patients were randomly divided into the training set and the validation set in a ratio of 8:2. Support vector machines (SVM), Random forest, Decision tree, XGBoost, and LightGBM were used to build models by differential variables. The classification performance of models was compared. Split, gain and SHAP were used to measure the importance of differential variables and improve the interpretability of the model. Moreover, a portion of the patients was included in the validation set to test the performance of the selected models. Results Three hundred twenty-eight patients with ASD and patients with 82 PAPVC were included in the training set and the validation set, respectively. The selection of 10 differential variables was based on univariate logistic regression, including right atrial diameter (longitudinal axis and transverse axis), right ventricular diameter, left atrial diameter, left ventricular end-diastolic diameter, left ventricular end-systolic diameter, P-wave voltage, P-wave interval PR interval, and QRS-wave voltage. In the classification model established based on differential variables, the LightGBM model achieved the highest performance on the validation set (AUC = 0.93). Based on variables importance analysis, the LightGBM-Clinic model was retrained by P-wave voltage, P-wave interval, PR interval, QRS wave interval, and right ventricular diameter, and performed excellently (AUC = 0.90). The AUC of the LightGBM-Clinic model was 0.87 in the test set. Conclusion In this study, the LightGBM model performs excellently in determining whether patients with ASD are accompanied by PAPVC. ECG parameters such as P-wave voltage were important to predictive value and enhance the explainability of the model.
Keywords