JMIR Formative Research (Dec 2023)
Identification of Hypertension in Electronic Health Records Through Computable Phenotype Development and Validation for Use in Public Health Surveillance: Retrospective Study
Abstract
BackgroundElectronic health record (EHR) systems are widely used in the United States to document care delivery and outcomes. Health information exchange (HIE) networks, which integrate EHR data from the various health care providers treating patients, are increasingly used to analyze population-level data. Existing methods for population health surveillance of essential hypertension by public health authorities may be complemented using EHR data from HIE networks to characterize disease burden at the community level. ObjectiveWe aimed to derive and validate computable phenotypes (CPs) to estimate hypertension prevalence for population-based surveillance using an HIE network. MethodsUsing existing data available from an HIE network, we developed 6 candidate CPs for essential (primary) hypertension in an adult population from a medium-sized Midwestern metropolitan area in the United States. A total of 2 independent clinician reviewers validated the phenotypes through a manual chart review of 150 randomly selected patient records. We assessed the precision of CPs by calculating sensitivity, specificity, positive predictive value (PPV), F1-score, and validity of chart reviews using prevalence-adjusted bias-adjusted κ. We further used the most balanced CP to estimate the prevalence of hypertension in the population. ResultsAmong a cohort of 548,232 adults, 6 CPs produced PPVs ranging from 71% (95% CI 64.3%-76.9%) to 95.7% (95% CI 84.9%-98.9%). The F1-score ranged from 0.40 to 0.91. The prevalence-adjusted bias-adjusted κ revealed a high percentage agreement of 0.88 for hypertension. Similarly, interrater agreement for individual phenotype determination demonstrated substantial agreement (range 0.70-0.88) for all 6 phenotypes examined. A phenotype based solely on diagnostic codes possessed reasonable performance (F1-score=0.63; PPV=95.1%) but was imbalanced with low sensitivity (47.6%). The most balanced phenotype (F1-score=0.91; PPV=83.5%) included diagnosis, blood pressure measurements, and medications and identified 210,764 (38.4%) individuals with hypertension during the study period (2014-2015). ConclusionsWe identified several high-performing phenotypes to identify essential hypertension prevalence for local public health surveillance using EHR data. Given the increasing availability of EHR systems in the United States and other nations, leveraging EHR data has the potential to enhance surveillance of chronic disease in health systems and communities. Yet given variability in performance, public health authorities will need to decide whether to seek optimal balance or declare a preference for algorithms that lean toward sensitivity or specificity to estimate population prevalence of disease.