Differentially Private Top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> Frequent Columns Publication for High-Dimensional Data

Ning Wang; Zhigang Wang; Yu Gu; Jia Xu; Zhiqiang Wei; Ge Yu

doi:10.1109/access.2019.2957762

IEEE Access (Jan 2019)

Differentially Private Top-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> Frequent Columns Publication for High-Dimensional Data

Ning Wang,
Zhigang Wang,
Yu Gu,
Jia Xu,
Zhiqiang Wei,
Ge Yu

Affiliations

Ning Wang: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Zhigang Wang: ORCiD; College of Information Science and Engineering, Ocean University of China, Qingdao, China
Yu Gu: ORCiD; College of Computer Science and Engineering, Northeastern University, Shenyang, China
Jia Xu: College of Computer Science and Engineering, Guangxi University, Nanning, China
Zhiqiang Wei: College of Information Science and Engineering, Ocean University of China, Qingdao, China
Ge Yu: College of Computer Science and Engineering, Northeastern University, Shenyang, China

DOI: https://doi.org/10.1109/access.2019.2957762
Journal volume & issue: Vol. 7
pp. 177342 – 177353

Abstract

Read online

Differential privacy (DP) is a promising scheme for releasing the results of statistical queries on sensitive data. This paper focuses on top-k frequent columns publication on sensitive data, with high result utility under differential privacy. Existing works directly select frequent columns from all columns (called one-phase scheme), which is far from ideal due to the large privacy consumption or misjudgments for columns with frequencies close to the frequency of the k th frequent column (called near-k-fluctuation-column). This paper presents a new solution Two Phase Selection (TPS) to carefully choose frequent columns in two phases. The main idea is to classify columns into two distinct categories based on whether it is one near-k-fluctuation-column or not. Frequent columns are chosen from the two categories using different techniques, which is totally different from existing solutions without classifying. Furthermore, by analyzing the distribution of near-k-fluctuation-columns, we introduce a block-centric column-choosing method privacy-free-mechanism (PFM). By partitioning columns into blocks, PFM makes the privacy consumption proportional to the number of blocks, instead of frequent columns. Extensive experiments on real datasets show that our proposals outperform the state-of-the-art techniques for top-k column publication.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords