BMC Medical Research Methodology (Mar 2025)

A flexible framework for local-level estimation of the effective reproductive number in geographic regions with sparse data

  • Md Sakhawat Hossain,
  • Ravi Goyal,
  • Natasha K. Martin,
  • Victor DeGruttola,
  • Mohammad Mihrab Chowdhury,
  • Christopher McMahan,
  • Lior Rennert

DOI
https://doi.org/10.1186/s12874-025-02525-1
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Our research focuses on local-level estimation of the effective reproductive number, which describes the transmissibility of an infectious disease and represents the average number of individuals one infectious person infects at a given time. The ability to accurately estimate the infectious disease reproductive number in geographically granular regions is critical for disaster planning and resource allocation. However, not all regions have sufficient infectious disease outcome data; this lack of data presents a significant challenge for accurate estimation. Methods To overcome this challenge, we propose a two-step approach that incorporates existing $$\:{R}_{t}$$ estimation procedures (EpiEstim, EpiFilter, EpiNow2) using data from geographic regions with sufficient data (step 1), into a covariate-adjusted Bayesian Integrated Nested Laplace Approximation (INLA) spatial model to predict $$\:{R}_{t}$$ in regions with sparse or missing data (step 2). Our flexible framework effectively allows us to implement any existing estimation procedure for $$\:{R}_{t}$$ in regions with coarse or entirely missing data. We perform external validation and a simulation study to evaluate the proposed method and assess its predictive performance. Results We applied our method to estimate $$\:{R}_{t}\:$$ using data from South Carolina (SC) counties and ZIP codes during the first COVID-19 wave (‘Wave 1’, June 16, 2020 – August 31, 2020) and the second wave (‘Wave 2’, December 16, 2020 – March 02, 2021). Among the three methods used in the first step, EpiNow2 yielded the highest accuracy of $$\:{R}_{t}$$ prediction in the regions with entirely missing data. Median county-level percentage agreement (PA) was 90.9% (Interquartile Range, IQR: 89.9–92.0%) and 92.5% (IQR: 91.6–93.4%) for Wave 1 and 2, respectively. Median zip code-level PA was 95.2% (IQR: 94.4–95.7%) and 96.5% (IQR: 95.8–97.1%) for Wave 1 and 2, respectively. Using EpiEstim, EpiFilter, and an ensemble-based approach yielded median PA ranging from 81.9 to 90.0%, 87.2-92.1%, and 88.4-90.9%, respectively, across both waves and geographic granularities. Conclusion These findings demonstrate that the proposed methodology is a useful tool for small-area estimation of $$\:{R}_{t}$$ , as our flexible framework yields high prediction accuracy for regions with coarse or missing data.

Keywords