IEEE Access (Jan 2019)

Differentially Private Top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> Frequent Columns Publication for High-Dimensional Data

  • Ning Wang,
  • Zhigang Wang,
  • Yu Gu,
  • Jia Xu,
  • Zhiqiang Wei,
  • Ge Yu

DOI
https://doi.org/10.1109/ACCESS.2019.2957762
Journal volume & issue
Vol. 7
pp. 177342 – 177353

Abstract

Read online

Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data. This paper focuses on top-k frequent columns publication on sensitive data, with high result utility under differential privacy. Existing works directly select frequent columns from all columns (called one-phase scheme), which is far from ideal due to the large privacy consumption or misjudgments for columns with frequencies close to the frequency of the k th frequent column (called near-k-fluctuation-column). This paper presents a new solution Two Phase Selection (TPS) to carefully choose frequent columns in two phases. The main idea is to classify columns into two distinct categories based on whether it is one near-k-fluctuation-column or not. Frequent columns are chosen from the two categories using different techniques, which is totally different from existing solutions without classifying. Furthermore, by analyzing the distribution of near-k-fluctuation-columns, we introduce a block-centric column-choosing method privacy-free-mechanism (PFM). By partitioning columns into blocks, PFM makes the privacy consumption proportional to the number of blocks, instead of frequent columns. Extensive experiments on real datasets show that our proposals outperform the state-of-the-art techniques for top-k column publication.

Keywords