A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First

Jianzhang Chen; Shuo Zhou; Jie Qiu; Yixin Xu; Bozhe Zeng; Wanchuan Fang; Xiangying Chen; Yipeng Huang; Zhengquan Xu; Youqin Chen

doi:10.3390/a17070293

Algorithms (Jul 2024)

A Histogram Publishing Method under Differential Privacy That Involves Balancing Small-Bin Availability First

Jianzhang Chen,
Shuo Zhou,
Jie Qiu,
Yixin Xu,
Bozhe Zeng,
Wanchuan Fang,
Xiangying Chen,
Yipeng Huang,
Zhengquan Xu,
Youqin Chen

Affiliations

Jianzhang Chen: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Shuo Zhou: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Jie Qiu: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Yixin Xu: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Bozhe Zeng: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Wanchuan Fang: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Xiangying Chen: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Yipeng Huang: College of Computer and Information Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
Zhengquan Xu: State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
Youqin Chen: State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

DOI: https://doi.org/10.3390/a17070293
Journal volume & issue: Vol. 17, no. 7
p. 293

Abstract

Read online

Differential privacy, a cornerstone of privacy-preserving techniques, plays an indispensable role in ensuring the secure handling and sharing of sensitive data analysis across domains such as in census, healthcare, and social networks. Histograms, serving as a visually compelling tool for presenting analytical outcomes, are widely employed in these sectors. Currently, numerous algorithms for publishing histograms under differential privacy have been developed, striving to balance privacy protection with the provision of useful data. Nonetheless, the pivotal challenge concerning the effective enhancement of precision for small bins (those intervals that are narrowly defined or contain a relatively small number of data points) within histograms has yet to receive adequate attention and in-depth investigation from experts. In standard DP histogram publishing, adding noise without regard for bin size can result in small data bins being disproportionately influenced by noise, potentially severely impairing the overall accuracy of the histogram. In response to this challenge, this paper introduces the SReB_GCA sanitization algorithm designed to enhance the accuracy of small bins in DP histograms. The SReB_GCA approach involves sorting the bins from smallest to largest and applying a greedy grouping strategy, with a predefined lower bound on the mean relative error required for a bin to be included in a group. Our theoretical analysis reveals that sorting bins in ascending order prior to grouping effectively prioritizes the accuracy of smaller bins. SReB_GCA ensures strict ϵ-DP compliance and strikes a careful balance between reconstruction error and noise error, thereby not only initially improving the accuracy of small bins but also approximately optimizing the mean relative error of the entire histogram. To validate the efficiency of our proposed SReB_GCA method, we conducted extensive experiments using four diverse datasets, including two real-life datasets and two synthetic ones. The experimental results, quantified by the Kullback–Leibler Divergence (KLD), show that the SReB_GCA algorithm achieves substantial performance enhancement compared to the baseline method (DP_BASE) and several other established approaches for differential privacy histogram publication.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords