npj Digital Medicine (Mar 2022)

A multicenter evaluation of computable phenotyping approaches for SARS-CoV-2 infection and COVID-19 hospitalizations

  • Rohan Khera,
  • Bobak J. Mortazavi,
  • Veer Sangha,
  • Frederick Warner,
  • H. Patrick Young,
  • Joseph S. Ross,
  • Nilay D. Shah,
  • Elitza S. Theel,
  • William G. Jenkinson,
  • Camille Knepper,
  • Karen Wang,
  • David Peaper,
  • Richard A. Martinello,
  • Cynthia A. Brandt,
  • Zhenqiu Lin,
  • Albert I. Ko,
  • Harlan M. Krumholz,
  • Benjamin D. Pollock,
  • Wade L. Schulz

DOI
https://doi.org/10.1038/s41746-022-00570-4
Journal volume & issue
Vol. 5, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Diagnosis codes are used to study SARS-CoV2 infections and COVID-19 hospitalizations in administrative and electronic health record (EHR) data. Using EHR data (April 2020–March 2021) at the Yale-New Haven Health System and the three hospital systems of the Mayo Clinic, computable phenotype definitions based on ICD-10 diagnosis of COVID-19 (U07.1) were evaluated against positive SARS-CoV-2 PCR or antigen tests. We included 69,423 patients at Yale and 75,748 at Mayo Clinic with either a diagnosis code or a positive SARS-CoV-2 test. The precision and recall of a COVID-19 diagnosis for a positive test were 68.8% and 83.3%, respectively, at Yale, with higher precision (95%) and lower recall (63.5%) at Mayo Clinic, varying between 59.2% in Rochester to 97.3% in Arizona. For hospitalizations with a principal COVID-19 diagnosis, 94.8% at Yale and 80.5% at Mayo Clinic had an associated positive laboratory test, with secondary diagnosis of COVID-19 identifying additional patients. These patients had a twofold higher inhospital mortality than based on principal diagnosis. Standardization of coding practices is needed before the use of diagnosis codes in clinical research and epidemiological surveillance of COVID-19.