Clinical Epidemiology (Mar 2023)

A Novel Chronic Kidney Disease Phenotyping Algorithm Using Combined Electronic Health Record and Claims Data

  • Mansour O,
  • Paik JM,
  • Wyss R,
  • Mastrorilli JM,
  • Bessette LG,
  • Lu Z,
  • Tsacogianis T,
  • Lin KJ

Journal volume & issue
Vol. Volume 15
pp. 299 – 307

Abstract

Read online

Omar Mansour,1,* Julie M Paik,1– 3,* Richard Wyss,1 Julianna M Mastrorilli,1 Lily Gui Bessette,1 Zhigang Lu,1 Theodore Tsacogianis,1 Kueiyu Joshua Lin1,4 1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 2Renal Division, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 3New England Geriatric Research Education and Clinical Center, VA Boston Healthcare System, Boston, MA, USA; 4Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA*These authors contributed equally to this workCorrespondence: Kueiyu Joshua Lin, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont St. Suite 3030, Boston, MA, 02120, USA, Tel +1 617 278-0930, Fax +1 617 232-8602, Email [email protected]: Because chronic kidney disease (CKD) is often under-coded as a diagnosis in claims data, we aimed to develop claims-based prediction models for CKD phenotypes determined by laboratory results in electronic health records (EHRs).Patients and Methods: We linked EHR from two networks (used as training and validation cohorts, respectively) with Medicare claims data. The study cohort included individuals ≥ 65 years with a valid serum creatinine result in the EHR from 2007 to 2017, excluding those with end-stage kidney disease or on dialysis. We used LASSO regression to select among 134 predictors for predicting continuous estimated glomerular filtration rate (eGFR). We assessed the model performance when predicting eGFR categories of < 60, < 45, < 30 mL/min/1.73m2 in terms of area under the receiver operating curves (AUC).Results: The model training cohort included 117,476 patients (mean age 74.8 years, female 58.2%) and the validation cohort included 56,744 patients (mean age 73.8 years, female 59.6%). In the validation cohort, the AUC of the primary model (with 113 predictors and an adjusted R2 of 0.35) for predicting eGFR < 60, eGFR< 45, and eGFR < 30 mL/min/1.73m2 categories was 0.81, 0.88, and 0.92, respectively, and the corresponding positive predictive values for these 3 phenotypes were 0.80 (95% confidence interval: 0.79, 0.81), 0.79 (0.75, 0.84), and 0.38 (0.30, 0.45), respectively.Conclusion: We developed a claims-based model to determine clinical phenotypes of CKD stages defined by eGFR values. Researchers without access to laboratory results can use the model-predicted phenotypes as a proxy clinical endpoint or confounder and to enhance subgroup effect assessment.Keywords: EHR, prediction, RPDR

Keywords