DPRF: A Differential Privacy Protection Random Forest

Jun Hou; Qianmu Li; Shunmei Meng; Zhen Ni; Yini Chen; Yaozong Liu

doi:10.1109/ACCESS.2019.2939891

IEEE Access (Jan 2019)

DPRF: A Differential Privacy Protection Random Forest

Jun Hou,
Qianmu Li,
Shunmei Meng,
Zhen Ni,
Yini Chen,
Yaozong Liu

Affiliations

Jun Hou: ORCiD; Nanjing Institute of Industry Technology, Nanjing, China
Qianmu Li: ORCiD; Intelligent Manufacturing Department, Wuyi University, Jiangmen, China
Shunmei Meng: ORCiD; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Zhen Ni: ORCiD; School of Information Engineering, Nanjing Xiaozhuang University, Nanjing, China
Yini Chen: School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
Yaozong Liu: Intelligent Manufacturing Department, Wuyi University, Jiangmen, China

DOI: https://doi.org/10.1109/ACCESS.2019.2939891
Journal volume & issue: Vol. 7
pp. 130707 – 130720

Abstract

Read online

Providing privacy protection for classification algorithms has become a research hotspot in current data mining. In this paper, differential privacy is applied to the random forest classification algorithm, and a random forest algorithm based on differential privacy is proposed to protect the privacy information in the data classification process. Firstly, differential privacy provides privacy protection by adding perturbation noise, which leads to a decrease in the classification accuracy of random forest algorithms. In order to reduce the impact of differential privacy protection on the accuracy of random forest classification, a hybrid decision tree algorithm is proposed. For the construction of a single decision tree in a random forest, the information gain ratio in the ID3 algorithm and the information gain ratio in the C4.5 are combined to generate a new attribute metric IG_GR to improve the classification accuracy of a single decision tree. Secondly, a new privacy budget allocation strategy is proposed. For nodes of different depths in the decision tree, the privacy budget is allocated to its counting query and attribute query by weight, which is used to balance the signal-to-noise ratio of differential privacy technology to nodes of different depths in the decision tree. At the same time, the hybrid decision tree algorithm is applied to the construction of random forest, which balances the privacy and classification accuracy of the random forest algorithm based on differential privacy. Finally, this paper conducted experiments on UCI's Adult and Mushroom datasets. The results show that compared with the traditional decision tree algorithm, the algorithm proposed in this paper has better classification accuracy; and the DPRF can provide effective privacy protection under the premise of ensuring high classification performance. The work of this paper achieves a balance between privacy and classification accuracy, and has practical application value.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords