Respiratory Research (May 2024)

A comprehensive study on machine learning models combining with oversampling for bronchopulmonary dysplasia-associated pulmonary hypertension in very preterm infants

  • Dan Wang,
  • Shuwei Huang,
  • Jingke Cao,
  • Zhichun Feng,
  • Qiannan Jiang,
  • Wanxian Zhang,
  • Jia Chen,
  • Shelby Kutty,
  • Changgen Liu,
  • Wenyu Liao,
  • Le Zhang,
  • Guli Zhu,
  • Wenhao Guo,
  • Jie Yang,
  • Lin Liu,
  • Jingwei Yang,
  • Qiuping Li

DOI
https://doi.org/10.1186/s12931-024-02797-z
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Bronchopulmonary dysplasia-associated pulmonary hypertension (BPD-PH) remains a devastating clinical complication seriously affecting the therapeutic outcome of preterm infants. Hence, early prevention and timely diagnosis prior to pathological change is the key to reducing morbidity and improving prognosis. Our primary objective is to utilize machine learning techniques to build predictive models that could accurately identify BPD infants at risk of developing PH. Methods The data utilized in this study were collected from neonatology departments of four tertiary-level hospitals in China. To address the issue of imbalanced data, oversampling algorithms synthetic minority over-sampling technique (SMOTE) was applied to improve the model. Results Seven hundred sixty one clinical records were collected in our study. Following data pre-processing and feature selection, 5 of the 46 features were used to build models, including duration of invasive respiratory support (day), the severity of BPD, ventilator-associated pneumonia, pulmonary hemorrhage, and early-onset PH. Four machine learning models were applied to predictive learning, and after comprehensive selection a model was ultimately selected. The model achieved 93.8% sensitivity, 85.0% accuracy, and 0.933 AUC. A score of the logistic regression formula greater than 0 was identified as a warning sign of BPD-PH. Conclusions We comprehensively compared different machine learning models and ultimately obtained a good prognosis model which was sufficient to support pediatric clinicians to make early diagnosis and formulate a better treatment plan for pediatric patients with BPD-PH.

Keywords