Scientific Reports (Feb 2023)

Risk factors and geographic disparities in premature cardiovascular mortality in US counties: a machine learning approach

  • Weichuan Dong,
  • Issam Motairek,
  • Khurram Nasir,
  • Zhuo Chen,
  • Uriel Kim,
  • Yassin Khalifa,
  • Darcy Freedman,
  • Stephanie Griggs,
  • Sanjay Rajagopalan,
  • Sadeer G. Al-Kindi

DOI
https://doi.org/10.1038/s41598-023-30188-9
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Disparities in premature cardiovascular mortality (PCVM) have been associated with socioeconomic, behavioral, and environmental risk factors. Understanding the “phenotypes”, or combinations of characteristics associated with the highest risk of PCVM, and the geographic distributions of these phenotypes is critical to targeting PCVM interventions. This study applied the classification and regression tree (CART) to identify county phenotypes of PCVM and geographic information systems to examine the distributions of identified phenotypes. Random forest analysis was applied to evaluate the relative importance of risk factors associated with PCVM. The CART analysis identified seven county phenotypes of PCVM, where high-risk phenotypes were characterized by having greater percentages of people with lower income, higher physical inactivity, and higher food insecurity. These high-risk phenotypes were mostly concentrated in the Black Belt of the American South and the Appalachian region. The random forest analysis identified additional important risk factors associated with PCVM, including broadband access, smoking, receipt of Supplemental Nutrition Assistance Program benefits, and educational attainment. Our study demonstrates the use of machine learning approaches in characterizing community-level phenotypes of PCVM. Interventions to reduce PCVM should be tailored according to these phenotypes in corresponding geographic areas.