PLoS ONE (Jan 2015)

A game theoretic framework for analyzing re-identification risk.

  • Zhiyu Wan,
  • Yevgeniy Vorobeychik,
  • Weiyi Xia,
  • Ellen Wright Clayton,
  • Murat Kantarcioglu,
  • Ranjit Ganta,
  • Raymond Heatherly,
  • Bradley A Malin

DOI
https://doi.org/10.1371/journal.pone.0120592
Journal volume & issue
Vol. 10, no. 3
p. e0120592

Abstract

Read online

Given the potential wealth of insights in personal data the big databases can provide, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. Yet these investigations focus on how attacks can be perpetrated, not the likelihood they will be realized. This paper introduces a game theoretic framework that enables a publisher to balance re-identification risk with the value of sharing data, leveraging a natural assumption that a recipient only attempts re-identification if its potential gains outweigh the costs. We apply the framework to a real case study, where the value of the data to the publisher is the actual grant funding dollar amounts from a national sponsor and the re-identification gain of the recipient is the fine paid to a regulator for violation of federal privacy rules. There are three notable findings: 1) it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk; 2) the zero-risk solution enables sharing much more data than a commonly invoked de-identification policy of the U.S. Health Insurance Portability and Accountability Act (HIPAA); and 3) a sensitivity analysis demonstrates these findings are robust to order-of-magnitude changes in player losses and gains. In combination, these findings provide support that such a framework can enable pragmatic policy decisions about de-identified data sharing.