Preventive Medicine Reports (Jul 2024)

Weighting the United States All of Us Research Program data to known population estimates using raking

  • Vivian Hsing-Chun Wang,
  • Jingwen Lei,
  • Tingjia Shi,
  • José A. Pagán

Journal volume & issue
Vol. 43
p. 102795

Abstract

Read online

Background: The All of Us Research Program aims to collect longitudinal health-related data from a million individuals in the United States. An inherent challenge of a non-probability sampling strategy through voluntary participation in All of Us is that findings may not be nationally representative for addressing health and health care at the population level. We generated survey weights for the All of Us data that can be used to address the challenge. Research design: We developed raked weights using demographic, health, and socioeconomic variables available in both the 2020 National Health Interview Survey (NHIS) and All of Us. We then compared the unweighted and weighted prevalence of a set of health-related variables (health behaviors, health conditions, and health insurance coverage) estimated from All of Us data with the weighted prevalence estimates obtained from NHIS data. Subjects: The sample included 100,391 All of Us participants 18 years of age and older with complete data collected between May 2017 and January 2022 across the United States. Results: Final variables in the raking procedure included age, sex, race/ethnicity, region of residence, annual household income, and home ownership. The mean percentage difference between known proportions obtained from the NHIS and All of Us was reduced by 18.89% for health-related variables after applying the raked weights. Conclusions: Raking improved the comparability of prevalence estimates obtained from All of Us to known national prevalence estimates. Refining the process of variable selection for raking may further improve the comparability between All of Us and nationally representative data.

Keywords