Does <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation Affect Machine-Learned Macrotrends?

Ana Rodriguez-Hoyos; Jose Estrada-Jimenez; David Rebollo-Monedero; Javier Parra-Arnau; Jordi Forne

doi:10.1109/ACCESS.2018.2834858

IEEE Access (Jan 2018)

Does <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-Anonymous Microaggregation Affect Machine-Learned Macrotrends?

Ana Rodriguez-Hoyos,
Jose Estrada-Jimenez,
David Rebollo-Monedero,
Javier Parra-Arnau,
Jordi Forne

Affiliations

Ana Rodriguez-Hoyos: Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Ladrón de Guevara, Quito, Ecuador
Jose Estrada-Jimenez: Departamento de Electrónica, Telecomunicaciones y Redes de Información, Escuela Politécnica Nacional, Ladrón de Guevara, Quito, Ecuador
David Rebollo-Monedero: ORCiD; Department of Telematics Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain
Javier Parra-Arnau: ORCiD; Department of Computer Science and Mathematics, CYBERCAT-Center for Cybersecurity Research of Catalonia, Universitat Rovira i Virgili, Tarragona, Spain
Jordi Forne: Department of Telematics Engineering, Universitat Politècnica de Catalunya, Barcelona, Spain

DOI: https://doi.org/10.1109/ACCESS.2018.2834858
Journal volume & issue: Vol. 6
pp. 28258 – 28277

Abstract

Read online

In the era of big data, the availability of massive amounts of information makes privacy protection more necessary than ever. Among a variety of anonymization mechanisms, microaggregation is a common approach to satisfy the popular requirement of k-anonymity in statistical databases. In essence, k-anonymous microaggregation aggregates quasi-identifiers to hide the identity of each data subject within a group of other k - 1 subjects. As any perturbative mechanism, however, anonymization comes at the cost of some information loss that may hinder the ulterior purpose of the released data, which very often is building machine-learning models for macrotrends analysis. To assess the impact of microaggregation on the utility of the anonymized data, it is necessary to evaluate the resulting accuracy of said models. In this paper, we address the problem of measuring the effect of k-anonymous microaggregation on the empirical utility of microdata. We quantify utility accordingly as the accuracy of classification models learned from microaggregated data, and evaluated over original test data. Our experiments indicate, with some consistency, that the impact of the de facto microaggregation standard (maximum distance to average vector) on the performance of machine-learning algorithms is often minor to negligible for a wide range of k for a variety of classification algorithms and data sets. Furthermore, experimental evidences suggest that the traditional measure of distortion in the community of microdata anonymization may be inappropriate for evaluating the utility of microaggregated data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords