Informatics in Medicine Unlocked (Jan 2023)

Use of claims data to predict the inpatient length of stay among U.S. stroke patients

  • Xiaobo Quan,
  • Deepika Gopukumar

Journal volume & issue
Vol. 42
p. 101337

Abstract

Read online

Background: The prediction of stroke inpatient length of stay (LOS) attracts much attention from researchers worldwide as it facilitates resource management and improves care. However, studies largely underutilized the claims data for predicting stroke LOS. The predictive models for the U.S. stroke population are understudied. Purpose: To evaluate the feasibility of using claims data for stroke LOS prediction and to identify new important predictors of LOS in U.S. stroke patients. Methods: Data preparation, analyses, and predictive modeling processes were conducted on a retrospective dataset, including claims and EHR data about acute care stroke admissions during 2010–2018. Two tree-based models (i.e., the eXtreme Gradient Boosting (XGBoost) model and Categorical Boosting (CatBoost) model) were trained through 10-fold cross-validation and compared with a baseline model that did not include any predictors. The predictive performance was evaluated on the holdout set using mean absolute error (MAE) and root mean squared error (RMSE). Importance plots and SHAP (SHapley Additive exPlanations) plots were used to identify the important predictors. Results: A total of 6102 stroke patients were included, with an average LOS of 6.4 days. The predictive models built using claims data (RMSE: 1.627; MAE: 0.530) performed similarly well as those built on the entire dataset, including additional variables from EHR (RMSE: 1.622; MAE: 0.533). Important predictors were admission channel and type, comorbidities (e.g., acute respiratory failure), medical services used (e.g., critical care, ambulance), facility characteristics (e.g., type, size), patient demographics, and patient socioeconomic status. Among these important predictors, admission channel and type, medical services including critical care and ambulance, facility type, and patient socioeconomic status were newly identified predictors not studied before. Conclusion: Claims data are suitable for stroke LOS prediction. The newly identified important predictors from this study could be integrated with other existing key predictors identified in previous research to improve the prediction, thereby aiding in better stroke care management.

Keywords