BMJ Public Health (Sep 2024)

Machine learning to evaluate the relationship between social determinants and diabetes prevalence in New York City

  • Ann Aerts,
  • Elizabeth Adamson,
  • William B Weeks,
  • Yongkang Zhang,
  • Darren Tanner,
  • Ji Eun Chang,
  • Peter Speyer,
  • Juan M Lavista Ferres

DOI
https://doi.org/10.1136/bmjph-2024-001394
Journal volume & issue
Vol. 2, no. 2

Abstract

Read online

Introduction Diabetes is a leading contributor to cardiovascular disease and mortality; social determinants of health (SDOH) are associated with disparities in diabetes risk. Quantifying the cumulative impact of SDOH and identifying the SDOH most associated with diabetes prevalence at the neighbourhood level can help policy-makers design and target local interventions to mitigate these disparities. Machine learning (ML) methods can provide novel insights and help inform public health intervention strategies in a place-based manner.Methods In a cross-sectional study, we used gradient boosting ML models to estimate the cumulative contribution of a set of SDOH variables to diabetes prevalence (%) at the census tract level within New York City (NYC); Shapley Additive Explanations were used to assess the magnitude and shape of relationships between our SDOH variables and model-predicted NYC diabetes prevalence. SDOH measures included socioeconomic position, educational attainment, food access, air quality, neighbourhood environment, housing conditions and insurance coverage.Results Across 2096 NYC census tracts (population 8 170 505), mean diabetes prevalence was 11.5% (SD 3.7%; range 1.9%–42.8%). A set of 16 SDOH variables representing a framework of 16 distinct SDOH concepts accounted for 67% of the between-tract variance in model-derived NYC diabetes prevalence estimates (95% CI 66% to 68%); a set of 81 variables representing these 16 concepts accounted for 80% of variance (95% CI 78% to 81%). Models showed excellent across-location generalisation. The most important variables driving model predictions within NYC were measures of low educational attainment and poverty.Conclusions SDOH accounted for a substantial proportion of neighbourhood-level variation in diabetes prevalence within NYC, independent of the demographics and health behaviours associated with those SDOH. Our place-based findings suggest that, within NYC, where approximately one million residents have diabetes and there are legislative requirements to reduce the impacts from diabetes, policies reducing socioeconomic and educational inequality could have the greatest potential to equitably achieve this.