IEEE Access (Jan 2019)

DPRF: A Differential Privacy Protection Random Forest

  • Jun Hou,
  • Qianmu Li,
  • Shunmei Meng,
  • Zhen Ni,
  • Yini Chen,
  • Yaozong Liu

DOI
https://doi.org/10.1109/ACCESS.2019.2939891
Journal volume & issue
Vol. 7
pp. 130707 – 130720

Abstract

Read online

Providing privacy protection for classification algorithms has become a research hotspot in current data mining. In this paper, differential privacy is applied to the random forest classification algorithm, and a random forest algorithm based on differential privacy is proposed to protect the privacy information in the data classification process. Firstly, differential privacy provides privacy protection by adding perturbation noise, which leads to a decrease in the classification accuracy of random forest algorithms. In order to reduce the impact of differential privacy protection on the accuracy of random forest classification, a hybrid decision tree algorithm is proposed. For the construction of a single decision tree in a random forest, the information gain ratio in the ID3 algorithm and the information gain ratio in the C4.5 are combined to generate a new attribute metric IG_GR to improve the classification accuracy of a single decision tree. Secondly, a new privacy budget allocation strategy is proposed. For nodes of different depths in the decision tree, the privacy budget is allocated to its counting query and attribute query by weight, which is used to balance the signal-to-noise ratio of differential privacy technology to nodes of different depths in the decision tree. At the same time, the hybrid decision tree algorithm is applied to the construction of random forest, which balances the privacy and classification accuracy of the random forest algorithm based on differential privacy. Finally, this paper conducted experiments on UCI's Adult and Mushroom datasets. The results show that compared with the traditional decision tree algorithm, the algorithm proposed in this paper has better classification accuracy; and the DPRF can provide effective privacy protection under the premise of ensuring high classification performance. The work of this paper achieves a balance between privacy and classification accuracy, and has practical application value.

Keywords