IEEE Access (Jan 2023)

When AI Meets Information Privacy: The Adversarial Role of AI in Data Sharing Scenario

  • Abdul Majeed,
  • Seong Oun Hwang

DOI
https://doi.org/10.1109/ACCESS.2023.3297646
Journal volume & issue
Vol. 11
pp. 76177 – 76195

Abstract

Read online

Artificial intelligence (AI) is a transformative technology with a substantial number of practical applications in commercial sectors such as healthcare, finance, aviation, and smart cities. AI also has strong synergy with the information privacy (IP) domain from two distinct aspects: as a protection tool (i.e., safeguarding privacy), and as a threat tool (i.e., compromising privacy). In the former case, AI techniques are amalgamated with the traditional anonymization techniques to improve various key components of the anonymity process, and therefore, privacy is safeguarded effectively. In the latter case, some adversarial knowledge is aggregated with the help of AI techniques and subsequently used to compromise the privacy of individuals. To the best of our knowledge, threats posed by AI-generated knowledge such as synthetic data (SD) to information privacy are often underestimated, and most of the existing anonymization methods do not consider/model this SD-based knowledge that can be available to the adversary, leading to privacy breaches in some cases. In this paper, we highlight the role of AI as a threat tool (i.e., AI used to compromise an individual’s privacy), with a special focus on SD that can serve as background knowledge leading to various kinds of privacy breaches. For instance, SD can encompass pertinent information (e.g., total # of attributes in data, distributions of sensitive information, category values of each attribute, minor and major values of some attributes, etc.) about real data that can offer a helpful hint to the adversary regarding the composition of anonymized data, that can subsequently lead to uncovering the identity or private information. We perform reasonable experiments on a real-life benchmark dataset to prove the pitfalls of AI in the data publishing scenario (when a database is either fully or partially released to public domains for conducting analytics).

Keywords