IEEE Access (Jan 2018)
PPDCA: Privacy-Preserving Crowdsourcing Data Collection and Analysis With Randomized Response
Abstract
Randomized response mechanisms for guaranteeing crowdsourcing data privacy have attracted scholarly attention; aggregators can ensure privacy by collecting only randomized data, and individuals can have plausible deniability regarding their responses. With these mechanisms, analysts employed by organizations can still make predictions and conduct analyses using the randomized data. Existing randomized response-based data collection solutions have severely restricted functionality and usability, resulting in impractical and inefficient systems. Therefore, we developed a randomized response-based privacy-preserving crowdsourcing data collection and analysis mechanism. We designed a complementary randomized response (C-RR) method to guarantee individuals' data privacy and to preserve features from the original data for analysis. We formalized a machine learning framework; our proposed method uses randomized data in the form of binary vectors to generate a learning network. Extensive experiments on real-world data sets demonstrated that our heavy-hitters estimation scheme, which applies C-RR and our data learning model, significantly outperformed existing estimation schemes in terms of data analysis.
Keywords