IEEE Access (Jan 2024)
Score, Arrange, and Cluster: A Novel Clustering-Based Technique for Privacy-Preserving Data Publishing
Abstract
Data-driven decision-making has become critical to every organization. There is a growing emphasis on adopting robust data governance frameworks for data management. This encompasses data publishing to empower stakeholders with the ability to access and analyze the published data, playing a pivotal role in decision-making. However, data publishing poses a threat to entity-specific information. Privacy-Preserving Data Publishing (PPDP) refers to publishing data while protecting the privacy of entity-specific information. K-anonymity is a well-recognized method that achieves PPDP and serves as the foundation of our proposed clustering-based data transformation algorithm, “Score, Arrange, and Cluster (SAC)”. For effective data management and decision-making in organizations, it is crucial to address the varying data requirements and role-based access levels of the involved stakeholders. SAC was designed to offer only a generic data transformation with minimal data quality degradation. Hence, this work presents an enhancement to SAC that takes into account stakeholder roles and requirements, as illustrated through different scenarios. The scoring mechanism in SAC is augmented to accommodate customization or use the concepts of Genetic Algorithms to enforce role-based access control. The “Cost of Degradation” (CoD) metric is used to quantify the data quality degradation. As per the experimental results, in the customization scenario, a higher attribute priority leads to lower data quality degradation, while, in the role-based access control scenario a higher access level results in a lower data quality degradation.
Keywords