Journal of Clinical and Translational Science (Apr 2022)
266 Inpatient Quality Indicators Risk-Adjustment Using Interactions Selected by Machine Learning Methods
Abstract
OBJECTIVES/GOALS: Predictive models for health outcomes often have poor calibration potentially due to interactions that are ignored by standard methods. Using AHRQ models for Inpatient Quality Indicator (IQI) 11 Abdominal Aortic Aneurysm Repair and IQI 09 Pancreatic Resection mortality, we hypothesize that identifying interactions may improve model calibration. METHODS/STUDY POPULATION: We used adult discharge data from 16 states obtained from AHRQ Healthcare Cost and Utilization Project (State Inpatient Database), California Department of Health Care Access and Information, and New York State Department of Health. We used AHRQ’s v2021-1 Clinical Classifications Software Refined (CCSR) with present on admission flags to create features for risk-adjustment. We compared the performance of Least absolute shrinkage and selection operator (LASSO) model and first-order interaction models estimated using Hierarchical Group Lasso Regression (HGLR), after splitting the data into training and test sets. C-statistics, area under the precision-recall curve and Hosmer-Lemeshow calibration plots are reported. Finally, logistic regression models with selected CCSRs were evaluated on the test set. RESULTS/ANTICIPATED RESULTS: IQI 11 has four strata: open and endovascular repair of ruptured aneurysms (39% and 21% mortality, respectively); open and endovascular repair of unruptured aneurysms (6% and 0.8% mortality, respectively). IQI 09 has two strata: with and without pancreatic cancer (2% and 2.5% mortality, respectively). Comparing the HGLR model (with interaction effects) with Lasso models (without interactions), we noticed meaningful improvements in discrimination and calibration. However, for IQI 09, the extremely low mortality rate did not result in good HGLR or LASSO models. Interactions involving CCSRs could be identified using the novel HGLR method, which improved model performance given a heterogeneous population in IQI 11 with a mix of high and low event rates, unlike the more homogeneous patient population in IQI 09. DISCUSSION/SIGNIFICANCE: Standard implementations of regression models fail to address critical issues that arise in healthcare data – (a) quadratic explosion of potential interactions that cannot be manually identified, and (b) categorical variables with multiple levels or values (e.g., age categories). We propose innovative use of HGLR to robustly address these issues.