Journal of Big Data (Jun 2025)

An accuracy-privacy optimization framework considering user’s privacy requirements for data stream mining

  • Waruni Hewage,
  • R. Sinha,
  • M. Asif Naeem

DOI
https://doi.org/10.1186/s40537-025-01147-0
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 32

Abstract

Read online

Abstract Data stream mining is a critical process utilized by organizations to derive insights from real-time data. Consequently, preserving the privacy of sensitive information while maintaining high accuracy remains a persistent challenge. Privacy-preserving data mining techniques modify data to increase privacy, a process that invariably decreases the accuracy of data mining algorithms. Though different techniques have been proposed to preserve privacy, there is a lack of well-formulated frameworks to optimize the trade-off between accuracy and privacy. This paper introduces a novel Accuracy-Privacy Optimization Framework (APOF) that allows users to define privacy requirements and predicts achievable accuracy levels, enabling fine-tuning of this balance. The logistic cumulative noise addition was used as the data perturbation method that has experimentally shown better performance and Hoeffding trees as the classifier. Additionally, a data fitting module using kernel regression is integrated, a unique approach that predicts accuracy levels based on user-defined privacy thresholds. Experimental results show that the proposed framework archives an optimal privacy level above 97% while minimising the accuracy loss across various datasets. By addressing critical gaps in privacy-preserving data mining, this study offers significant contributions to real-world applications, facilitating secure and efficient data utilization in dynamic environments.

Keywords