BMC Medical Research Methodology (Feb 2020)

Development and validation of algorithms to classify type 1 and 2 diabetes according to age at diagnosis using electronic health records

  • Calvin Ke,
  • Thérèse A. Stukel,
  • Andrea Luk,
  • Baiju R. Shah,
  • Prabhat Jha,
  • Eric Lau,
  • Ronald C. W. Ma,
  • Wing-Yee So,
  • Alice P. Kong,
  • Elaine Chow,
  • Juliana C. N. Chan

DOI
https://doi.org/10.1186/s12874-020-00921-3
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Validated algorithms to classify type 1 and 2 diabetes (T1D, T2D) are mostly limited to white pediatric populations. We conducted a large study in Hong Kong among children and adults with diabetes to develop and validate algorithms using electronic health records (EHRs) to classify diabetes type against clinical assessment as the reference standard, and to evaluate performance by age at diagnosis. Methods We included all people with diabetes (age at diagnosis 1.5–100 years during 2002–15) in the Hong Kong Diabetes Register and randomized them to derivation and validation cohorts. We developed candidate algorithms to identify diabetes types using encounter codes, prescriptions, and combinations of these criteria (“combination algorithms”). We identified 3 algorithms with the highest sensitivity, positive predictive value (PPV), and kappa coefficient, and evaluated performance by age at diagnosis in the validation cohort. Results There were 10,196 (T1D n = 60, T2D n = 10,136) and 5101 (T1D n = 43, T2D n = 5058) people in the derivation and validation cohorts (mean age at diagnosis 22.7, 55.9 years; 53.3, 43.9% female; for T1D and T2D respectively). Algorithms using codes or prescriptions classified T1D well for age at diagnosis < 20 years, but sensitivity and PPV dropped for older ages at diagnosis. Combination algorithms maximized sensitivity or PPV, but not both. The “high sensitivity for type 1” algorithm (ratio of type 1 to type 2 codes ≥ 4, or at least 1 insulin prescription within 90 days) had a sensitivity of 95.3% (95% confidence interval 84.2–99.4%; PPV 12.8%, 9.3–16.9%), while the “high PPV for type 1” algorithm (ratio of type 1 to type 2 codes ≥ 4, and multiple daily injections with no other glucose-lowering medication prescription) had a PPV of 100.0% (79.4–100.0%; sensitivity 37.2%, 23.0–53.3%), and the “optimized” algorithm (ratio of type 1 to type 2 codes ≥ 4, and at least 1 insulin prescription within 90 days) had a sensitivity of 65.1% (49.1–79.0%) and PPV of 75.7% (58.8–88.2%) across all ages. Accuracy of T2D classification was high for all algorithms. Conclusions Our validated set of algorithms accurately classifies T1D and T2D using EHRs for Hong Kong residents enrolled in a diabetes register. The choice of algorithm should be tailored to the unique requirements of each study question.

Keywords