PLoS Medicine (Nov 2018)

Enhancing the prediction of acute kidney injury risk after percutaneous coronary intervention using machine learning techniques: A retrospective cohort study.

  • Chenxi Huang,
  • Karthik Murugiah,
  • Shiwani Mahajan,
  • Shu-Xia Li,
  • Sanket S Dhruva,
  • Julian S Haimovich,
  • Yongfei Wang,
  • Wade L Schulz,
  • Jeffrey M Testani,
  • Francis P Wilson,
  • Carlos I Mena,
  • Frederick A Masoudi,
  • John S Rumsfeld,
  • John A Spertus,
  • Bobak J Mortazavi,
  • Harlan M Krumholz

DOI
https://doi.org/10.1371/journal.pmed.1002703
Journal volume & issue
Vol. 15, no. 11
p. e1002703

Abstract

Read online

BACKGROUND:The current acute kidney injury (AKI) risk prediction model for patients undergoing percutaneous coronary intervention (PCI) from the American College of Cardiology (ACC) National Cardiovascular Data Registry (NCDR) employed regression techniques. This study aimed to evaluate whether models using machine learning techniques could significantly improve AKI risk prediction after PCI. METHODS AND FINDINGS:We used the same cohort and candidate variables used to develop the current NCDR CathPCI Registry AKI model, including 947,091 patients who underwent PCI procedures between June 1, 2009, and June 30, 2011. The mean age of these patients was 64.8 years, and 32.8% were women, with a total of 69,826 (7.4%) AKI events. We replicated the current AKI model as the baseline model and compared it with a series of new models. Temporal validation was performed using data from 970,869 patients undergoing PCIs between July 1, 2016, and March 31, 2017, with a mean age of 65.7 years; 31.9% were women, and 72,954 (7.5%) had AKI events. Each model was derived by implementing one of two strategies for preprocessing candidate variables (preselecting and transforming candidate variables or using all candidate variables in their original forms), one of three variable-selection methods (stepwise backward selection, lasso regularization, or permutation-based selection), and one of two methods to model the relationship between variables and outcome (logistic regression or gradient descent boosting). The cohort was divided into different training (70%) and test (30%) sets using 100 different random splits, and the performance of the models was evaluated internally in the test sets. The best model, according to the internal evaluation, was derived by using all available candidate variables in their original form, permutation-based variable selection, and gradient descent boosting. Compared with the baseline model that uses 11 variables, the best model used 13 variables and achieved a significantly better area under the receiver operating characteristic curve (AUC) of 0.752 (95% confidence interval [CI] 0.749-0.754) versus 0.711 (95% CI 0.708-0.714), a significantly better Brier score of 0.0617 (95% CI 0.0615-0.0618) versus 0.0636 (95% CI 0.0634-0.0638), and a better calibration slope of observed versus predicted rate of 1.008 (95% CI 0.988-1.028) versus 1.036 (95% CI 1.015-1.056). The best model also had a significantly wider predictive range (25.3% versus 21.6%, p < 0.001) and was more accurate in stratifying AKI risk for patients. Evaluated on a more contemporary CathPCI cohort (July 1, 2015-March 31, 2017), the best model consistently achieved significantly better performance than the baseline model in AUC (0.785 versus 0.753), Brier score (0.0610 versus 0.0627), calibration slope (1.003 versus 1.062), and predictive range (29.4% versus 26.2%). The current study does not address implementation for risk calculation at the point of care, and potential challenges include the availability and accessibility of the predictors. CONCLUSIONS:Machine learning techniques and data-driven approaches resulted in improved prediction of AKI risk after PCI. The results support the potential of these techniques for improving risk prediction models and identification of patients who may benefit from risk-mitigation strategies.