Weighting the United States All of Us Research Program data to known population estimates using raking

Vivian Hsing-Chun Wang; Jingwen Lei; Tingjia Shi; José A. Pagán

doi:10.1016/j.pmedr.2024.102795

Preventive Medicine Reports (Jul 2024)

Weighting the United States All of Us Research Program data to known population estimates using raking

Vivian Hsing-Chun Wang,
Jingwen Lei,
Tingjia Shi,
José A. Pagán

Affiliations

Vivian Hsing-Chun Wang: Center for Population Health & Health Services Research, Department of Foundations of Medicine, NYU Grossman Long Island School of Medicine, Mineola, NY, USA; Corresponding author at: Center for Population & Health Services Research, Department of Foundations of Medicine, NYU Grossman Long Island School of Medicine, 101 Mineola Boulevard, Mineola, NY 11501.
Jingwen Lei: Department of Biostatistics, School of Global Public Health, New York University, New York, NY, USA
Tingjia Shi: Department of Biostatistics, School of Global Public Health, New York University, New York, NY, USA
José A. Pagán: Department of Public Health Policy and Management, School of Global Public Health, New York University, New York, NY, USA

DOI: https://doi.org/10.1016/j.pmedr.2024.102795
Journal volume & issue: Vol. 43
p. 102795

Abstract

Read online

Background: The All of Us Research Program aims to collect longitudinal health-related data from a million individuals in the United States. An inherent challenge of a non-probability sampling strategy through voluntary participation in All of Us is that findings may not be nationally representative for addressing health and health care at the population level. We generated survey weights for the All of Us data that can be used to address the challenge. Research design: We developed raked weights using demographic, health, and socioeconomic variables available in both the 2020 National Health Interview Survey (NHIS) and All of Us. We then compared the unweighted and weighted prevalence of a set of health-related variables (health behaviors, health conditions, and health insurance coverage) estimated from All of Us data with the weighted prevalence estimates obtained from NHIS data. Subjects: The sample included 100,391 All of Us participants 18 years of age and older with complete data collected between May 2017 and January 2022 across the United States. Results: Final variables in the raking procedure included age, sex, race/ethnicity, region of residence, annual household income, and home ownership. The mean percentage difference between known proportions obtained from the NHIS and All of Us was reduced by 18.89% for health-related variables after applying the raked weights. Conclusions: Raking improved the comparability of prevalence estimates obtained from All of Us to known national prevalence estimates. Refining the process of variable selection for raking may further improve the comparability between All of Us and nationally representative data.

Published in Preventive Medicine Reports

ISSN: 2211-3355 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine
Website: https://www.sciencedirect.com/journal/preventive-medicine-reports

About the journal

Abstract

Keywords