Clinical Epidemiology (Nov 2022)

Advancing an Algorithm for the Identification of Patients with High Data-Continuity in Electronic Health Records

  • Merola D,
  • Schneeweiss S,
  • Jin Y,
  • Lii J,
  • Lin KJ

Journal volume & issue
Vol. Volume 14
pp. 1339 – 1349

Abstract

Read online

David Merola,1,2 Sebastian Schneeweiss,1,2 Yinzhu Jin,1 Joyce Lii,1 Kueiyu Joshua Lin1,3 1Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 2Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA; 3Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USACorrespondence: Kueiyu Joshua Lin, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont St. Suite 3030, Boston, MA, 02120, USA, Tel +1 617 278-0930, Fax +1 617 232-8602, Email [email protected]: Identifying high data-continuity patients in an electronic health record (EHR) system may facilitate selecting cohorts with a lower degree of variable misclassification and promote study validity. We updated a previously developed algorithm for identifying patients with high EHR data-completeness by adding demographic and health utilization factors to improve adaptability to networks serving patients of diverse backgrounds. We also expanded the algorithm to accommodate data in the ICD-10 era.Methods: We used Medicare claims linked with EHR data to identify individuals aged ≥ 65 years. EHR-continuity was defined as the proportion of encounters captured in EHR data relative to claims. We compared the model with additional demographic factors and their interaction terms with other predictors with the original algorithm and assessed the performance by area under the ROC curve (AUC) and net reclassification index (NRI).Results: The study cohort consisted of 264,099 subjects. The updated prediction model had an AUC of 0.93 in the validation set. Compared to the previous model, the new model had an NRI of 37.4% (p< 0.001) for EHR-continuity classification. Interaction terms between demographic variables and other predictors did not improve the performance. Patients within the top 20% of predicted EHR-continuity had four times less misclassification of key variables compared to the remaining population.Conclusion: Adding demographic and healthcare utilization variables significantly improved the model performance. Patients with high predicted EHR-continuity had less misclassification of study variables compared to the remaining population in both ICD-9 and 10 eras.Keywords: electronic medical records, comparative effectiveness research, information bias, data continuity

Keywords