Journal of King Saud University: Computer and Information Sciences (Apr 2022)
Improved l-diversity: Scalable anonymization approach for Privacy Preserving Big Data Publishing
Abstract
In the era of big data analytics, data owner is more concern about the data privacy. Data anonymization approaches such as k-anonymity, l-diversity, and t-closeness are used for a long time to preserve privacy in published data. However, these approaches cannot be directly applicable to a large amount of data. Distributed programming framework such as MapReduce and Spark are used for big data analytics which add more challenges to privacy preserving data publishing. Recently, we identified few scalable approaches for Privacy Preserving Big Data Publishing in literature and majority of them are based on k-anonymity and l-diversity. However, these approaches require a significant improvement to reach the level of existing privacy preserving data publishing approaches, therefore, we propose Improved Scalable l-Diversity (ImSLD) approach which is the extension of Improved Scalable k-Anonymity (ImSKA) for scalable anonymization in this paper. Our approaches are based on scalable k-anonymization that uses MapReduce as a programming paradigm. We use poker dataset and synthesize big data versions of poker dataset to test our approaches. The result analysis shows significant improvement in terms of running time due to the lesser number of MapReduce iterations and also exhibits lower information loss as compared to existing approaches while providing the same level of privacy due to tight arrangement of the records in the initial equivalence class.