IEEE Access (Jan 2020)
Rapid Re-Identification Risk Assessment for Anonymous Data Set in Mobile Multimedia Scene
Abstract
Ubiquitous mobile multimedia applications bring great convenience to users. However, when enjoying mobile multimedia services, users provide personal data to service platforms. Although the service platforms always claim that the collected personal data are de-identified, the risk of re-identifying users through linkage attacks still exists and is incalculable. This paper proposes a rapid prediction model for the overall re-identification risk based on the statistics of data sets (i.e., the number of individuals, number of attributes, distribution of attribute values, and attribute dependency). Our proposed model reveals the impact of statistics on the overall re-identification risk and adopts random sampling and semi-random sampling methods to predict the overall re-identification risk of data sets with and without strong dependency ordered attribute pairs. Experimental results show that for the data sets without strong dependency ordered attribute pairs, the random sampling method has a high prediction accuracy (the prediction error is less than 0.05). For the data sets with strong dependency ordered attribute pairs, the semi-random sampling method has a high prediction accuracy (the prediction error is less than 0.09). Exploiting our model, governments and individuals can quickly assess the privacy leakage risk of their data sets, given only the statistic of the data sets. Besides, this model can also evaluate the privacy risk of data collection schemes in advance according to historical statistics, and identify suspected services.
Keywords