SSM: Population Health (Jun 2021)
Comparing denominator sources for real-time disease incidence modeling: American Community Survey and WorldPop
Abstract
Across the United States public health community in 2020, in the midst of a pandemic and increased concern regarding racial/ethnic health disparities, there is widespread concern about our ability to accurately estimate small-area disease incidence rates due to the absence of a recent census to obtain reliable population denominators. 2010 decennial census data are likely outdated, and intercensal population estimates from the Census Bureau, which are less temporally misaligned with real-time disease incidence data, are not recommended for use with small areas. Machine learning-based population estimates are an attractive option but have not been validated for use in epidemiologic studies. Treating 2010 decennial census counts as a “ground truth”, we conduct a case study to compare the performance of alternative small-area population denominator estimates from surrounding years for modeling real-time disease incidence rates. Our case study focuses on modeling health disparities in census tract incidence rates in Massachusetts, using population size estimates from the American Community Survey (ACS), the most commonly-used intercensal small-area population data in epidemiology, and WorldPop, a machine learning model for high-resolution population size estimation. Through simulation studies and an analysis of real premature mortality data, we evaluate whether WorldPop denominators can provide improved performance relative to ACS for quantifying disparities using both census tract-aggregate and race-stratified modeling approaches. We find that biases induced in parameter estimates due to temporally incompatible incidence and denominator data tend to be larger for race-stratified models than for area-aggregate models. In most scenarios considered here, WorldPop denominators lead to greater bias in estimates of health disparities than ACS denominators. These insights will assist researchers in intercensal years to select appropriate population size estimates for modeling disparities in real-time disease incidence. We highlight implications for health disparity studies in the coming decade, as 2020 census counts may introduce new sources of error.