American Journal of Preventive Cardiology (Dec 2020)

County-level phenomapping to identify disparities in cardiovascular outcomes: An unsupervised clustering analysis

  • Matthew W. Segar,
  • Shreya Rao,
  • Ann Marie Navar,
  • Erin D. Michos,
  • Alana Lewis,
  • Adolfo Correa,
  • Mario Sims,
  • Amit Khera,
  • Amy E. Hughes,
  • Ambarish Pandey

Journal volume & issue
Vol. 4
p. 100118

Abstract

Read online

Introduction: Significant heterogeneity in cardiovascular disease (CVD) risk and healthcare resource allocation has been demonstrated in the United States, but optimal methods to capture heterogeneity in county-level characteristics that contribute to CVD mortality differences are unclear. We evaluated the feasibility of unsupervised machine learning (ML)-based phenomapping in identifying subgroups of county-level social and demographic risk factors with differential CVD outcomes. Methods: We performed a cross-sectional study using county-level data from 2008 to 2018 from the Centers for Disease Control (CDC) WONDER platform and the 2020 Robert Wood Johnson County Health Rankings program. Unsupervised clustering was performed on 46 facets of population characteristics spanning the demographic, health behaviors, socioeconomic, and healthcare access domains. Spatial autocorrelation was assessed using the Moran’s I test, and temporal trends in age-adjusted CVD outcomes were evaluated using linear mixed effect models and least square means. Results: Among 2676 counties, 4 county-level phenogroups were identified (Moran’s I p-value <0.001). Phenogroup 1 (N ​= ​924; 24.5%) counties were largely white, suburban households with high income and access to healthcare. Phenogroup 2 counties (N ​= ​451; 16.9%) included predominantly Hispanic residents and below-average prevalence of CVD risk factors. Phenogroup 3 (N ​= ​951; 35.5%) counties included rural, white residents with the lowest levels of access to healthcare. Phenogroup 4 (350; 13.1%) comprised counties with predominantly Black residents, substantial cardiovascular comorbidities, and physical and socioeconomic burdens. Least square means in age-adjusted cardiovascular mortality over time increased in a stepwise fashion from 223 in phenogroup 1 to 317 per 100,000 residents in phenogroup 4. Conclusions: Unsupervised ML-based clustering on county-level population characteristics can identify unique phenogroups with differential risk of CVD mortality. Phenogroup identification may aid in developing a uniform set of preventive initiatives for clustered counties to address regional differences in CVD mortality.

Keywords