Model Selection when multiple imputation is used to protect confidentiality in public use data

Satkartar K. Kinney; Jerome P. Reiter; James O. Berger

doi:10.29012/jpc.v2i2.588

The Journal of Privacy and Confidentiality (Apr 2011)

Model Selection when multiple imputation is used to protect confidentiality in public use data

Satkartar K. Kinney,
Jerome P. Reiter,
James O. Berger

Affiliations

Satkartar K. Kinney: National Institute of Statistical Sciences
Jerome P. Reiter: Duke University
James O. Berger: Duke University

DOI: https://doi.org/10.29012/jpc.v2i2.588
Journal volume & issue: Vol. 2, no. 2

Abstract

Read online

Several statistical agencies use, or are considering the use of, multiple imputation to limit the risk of disclosing respondents' identities or sensitive attributes in public use files. For example, agencies can release partially synthetic datasets, comprising the units originally surveyed with some values, such as sensitive values at high risk of disclosure, or values of key identifiers, replaced with multiple imputations. We describe how secondary analysts of such multiply-imputed datasets can implement Bayesian model selection procedures that appropriately condition on the multiple datasets and the information released by the agency about the imputation models. We illustrate by deriving Bayes factor approximations and a data augmentation step for stochastic search variable selection algorithms.

Published in The Journal of Privacy and Confidentiality

ISSN: 2575-8527 (Online)
Publisher: Labor Dynamics Institute
Country of publisher: United States
LCC subjects: Technology; Social Sciences
Website: https://journalprivacyconfidentiality.org/

About the journal

Abstract

Keywords