Stats (Jun 2021)
A Bayesian Approach to Linking a Survey and a Census via Small Areas
Abstract
We predict the finite population proportion of a small area when individual-level data are available from a survey and more extensive household-level (not individual-level) data (covariates but not responses) are available from a census. The census and the survey consist of the same strata and primary sampling units (PSU, or wards) that are matched, but the households are not matched. There are some common covariates at the household level in the survey and the census and these covariates are used to link the households within wards. There are also covariates at the ward level, and the wards are the same in the survey and the census. Using a two-stage procedure, we study the multinomial counts in the sampled households within the wards and a projection method to infer about the non-sampled wards. This is accommodated by a multinomial-Dirichlet–Dirichlet model, a three-stage hierarchical Bayesian model for multinomial counts, as it is necessary to account for heterogeneity among the households. The key theoretical contribution of this paper is to develop a computational algorithm to sample the joint posterior density of the multinomial-Dirichlet–Dirichlet model. Specifically, we obtain samples from the distributions of the proportions for each multinomial cell. The second key contribution is to use two projection procedures (parametric based on the nested error regression model and non-parametric based on iterative re-weighted least squares), on these proportions to link the survey to the census, thereby providing a copy of the census counts. We compare the multinomial-Dirichlet–Dirichlet (heterogeneous) model and the multinomial-Dirichlet (homogeneous) model without household effects via these two projection methods. An example of the second Nepal Living Standards Survey is presented.
Keywords