IEEE Access (Jan 2024)

Solving the Privacy-Equity Trade-off in Data Sharing By Using Homophily, Diversity, and <italic>t</italic>-Closeness Based Anonymity Algorithm

  • Abdul Majeed,
  • Seong Oun Hwang

DOI
https://doi.org/10.1109/ACCESS.2024.3510332
Journal volume & issue
Vol. 12
pp. 181953 – 181974

Abstract

Read online

In the modern era, personal data published by data owners play a vital role in decision-making, resource allocation, disease mitigation, and/or epidemiological analysis. However, if the published data do not truly reflect the characteristics of the underlying population from all perspectives, informed decisions cannot be made, leading to disproportionally fewer benefits for some marginal populations. Unfortunately, existing anonymization techniques often dilute/erase the representation of various populations, especially minor and super-minor groups, in the anonymized data, which inadvertently propagates inequity in subsequently published data analytics. To address these technical problems, in this paper, we implement a homophily, diversity- and t-closeness-based anonymity algorithm that effectively solves the privacy-equity trade-off (i.e., preserving privacy while representing all population groups, regardless of major/minor status) in anonymized data. We implement an automated method to identify equity-vulnerable attributes from the original data to protect the values against dilution/deletion. We develop a clustering method that considers both homophily and diversity among records, and that constructs compact, diverse, and balanced clusters. We employed the t-closeness principle to ensure a balanced distribution of equity-vulnerable attribute values in all clusters. We implement a flexible generalization scheme that performs only the required generalization of attributes to keep functional relationships similar between anonymized and real data. Rigorous experiments are performed on seven real-life benchmark datasets to justify the feasibility of our algorithm. Compared with the state-of-the-art method, our algorithm can lower privacy risks by up to 41.98% and enhance data quality by up to 36.72%. From an equity preservation point of view, it shows a 28.43% improvement over its counterpart, and equity losses in most cases are marginally lower than the original data.

Keywords