BMC Medical Informatics and Decision Making (Nov 2022)

Machine learning methods to predict 30-day hospital readmission outcome among US adults with pneumonia: analysis of the national readmission database

  • Yinan Huang,
  • Ashna Talwar,
  • Ying Lin,
  • Rajender R. Aparasu

DOI
https://doi.org/10.1186/s12911-022-01995-3
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Hospital readmissions for pneumonia are a growing concern in the US, with significant consequences for costs and quality of care. This study developed the rule-based model and other machine learning (ML) models to predict 30-day readmission risk in patients with pneumonia and compared model performance. Methods This population-based study involved patients aged ≥ 18 years hospitalized with pneumonia from January 1, 2016, through November 30, 2016, using the Healthcare Cost and Utilization Project-National Readmission Database (HCUP-NRD). Rule-based algorithms and other ML algorithms, specifically decision trees, random forest, extreme gradient descent boosting (XGBoost), and Least Absolute Shrinkage and Selection Operator (LASSO), were used to model all-cause readmissions 30 days post-discharge from index pneumonia hospitalization. A total of 61 clinically relevant variables were included for ML model development. Models were trained on randomly partitioned 50% of the data and evaluated using the remaining dataset. Model hyperparameters were tuned using the ten-fold cross-validation on the resampled training dataset. The area under the receiver operating curves (AUROC) and area under precision-recall curves (AUPRC) were calculated for the testing set to evaluate the model performance. Results Of the 372,293 patients with an index hospital hospitalization for pneumonia, 48,280 (12.97%) were readmitted within 30 days. Judged by AUROC in the testing data, rule-based model (0.6591) significantly outperformed decision tree (0.5783, p value < 0.001), random forest (0.6509, p value < 0.01) and LASSO (0.6087, p value < 0.001), but was less superior than XGBoost (0.6606, p value = 0.015). The AUPRC of the rule-based model in the testing data (0.2146) was higher than the decision tree (0.1560), random forest (0.2052), and LASSO (0.2042), but was similar to XGBoost (0.2147). The top risk-predictive rules captured by the rule-based algorithm were comorbidities, illness severity, disposition locations, payer type, age, and length of stay. These predictive risk factors were also identified by other ML models with high variable importance. Conclusion The performance of machine learning models for predicting readmission in pneumonia patients varied. The XGboost was better than the rule-based model based on the AUROC. However, important risk factors for predicting readmission remained consistent across ML models.

Keywords