Journal of Probability and Statistics (Jan 2025)

A Method of Complex Disclosure Risk Assessment for Microdata

  • Andrzej Młodak

DOI
https://doi.org/10.1155/jpas/1876232
Journal volume & issue
Vol. 2025

Abstract

Read online

This article proposes a method for constructing an aggregate measure of disclosure risk: the risk that a user or an intruder can derive an individual’s confidential information from a given data set. The method includes components of categorical and continuous variables, which leads to the identification of threats to data confidentiality in the maximum possible way. The construction of the suggested measure relies on the frequency approach. For continuous variables, this refers to the number of observed values for such a variable that belongs to the environment of the considered value, as determined by an arbitrarily defined precision level for reidentification. Moreover, the Shapley and solidarity values—two alternative solutions in cooperative game theory with properties that make them effective tools for this purpose—are employed to assess particular variables’ contribution to the total individual and global risk, using the idea of minimum unsafe combinations. To some extent, this proposal refers to the Special Uniques Detection Algorithm (SUDA) and may function as its extension toward computing overall risk that takes into account both categorical and continuous variables. The complex measure can reflect the actual level of disclosure risk better than commonly used tools, addressed separately for categorical and continuous quasi-identifiers. Moreover, the measures for the latter type are few and rather difficult to interpret. The solution presented in the article aims to overcome these problems. The simulation study and the assessment of disclosure risk for microdata from the Adult Person Survey within the Balance of Human Capital project in Poland confirm the utility of the proposed measures.