Privacy Protection from Sampling and Perturbation in Survey Microdata

Natalie Shlomo; Chris J. Skinner

doi:10.29012/jpc.v4i1.615

The Journal of Privacy and Confidentiality (Jul 2012)

Privacy Protection from Sampling and Perturbation in Survey Microdata

Natalie Shlomo,
Chris J. Skinner

Affiliations

Natalie Shlomo: Southampton Statistical Sciences Research Institute, University of Southampton, Highfield, Southampton, UK
Chris J. Skinner: Department of Statistics, London School of Economics and Political Science, London, UK

DOI: https://doi.org/10.29012/jpc.v4i1.615
Journal volume & issue: Vol. 4, no. 1

Abstract

Read online

Statistical agencies release microdata from social surveys as public-use files after applying statistical disclosure limitation (SDL) techniques. Disclosure risk is typically assessed in terms of identification risk, where it is supposed that small counts on cross-classified identifying key variables, i.e. a key, could be used to make an identification and confidential information may be learnt. In this paper we explore the application of definitions of privacy from the computer science literature to the same problem, with a focus on sampling and a form of perturbation which can be represented as misclassification. We consider two privacy definitions: differential privacy and probabilistic differential privacy. Chaudhuri and Mishra (2006) have shown that sampling does not guarantee differential privacy, but that, under certain conditions, it may ensure probabilistic differential privacy. We discuss these definitions and conditions in the context of survey microdata. We then extend this discussion to the case of perturbation. We show that differential privacy can be ensured if and only if the perturbation employs a misclassification matrix with no zero entries. We also show that probabilistic differential privacy is a viable alternative to differential privacy when there are zeros in the misclassification matrix. We discuss some common examples of SDL methods where in some cases zeros may be prevalent in the misclassification matrix.

Published in The Journal of Privacy and Confidentiality

ISSN: 2575-8527 (Online)
Publisher: Labor Dynamics Institute
Country of publisher: United States
LCC subjects: Technology; Social Sciences
Website: https://journalprivacyconfidentiality.org/

About the journal

Abstract

Keywords