IEEE Access (Jan 2021)

Review: Privacy-Preservation in the Context of Natural Language Processing

  • Darshini Mahendran,
  • Changqing Luo,
  • Bridget T. Mcinnes

DOI
https://doi.org/10.1109/ACCESS.2021.3124163
Journal volume & issue
Vol. 9
pp. 147600 – 147612

Abstract

Read online

Data privacy is one of the highly discussed issues in recent years as we encounter data breaches and privacy scandals often. This raises a lot of concerns about the ways the data is acquired and the potential information leaks. Especially in the field of Artificial Intelligence (AI), the widely using of AI models aggravates the vulnerability of user privacy because a considerable portion of user data that AI models used is represented in natural language. In the past few years, many researchers have proposed NLP-based methods to address these data privacy challenges. To the best of our knowledge, this is the first interdisciplinary review discussing privacy preservation in the context of NLP. In this paper, we present a comprehensive review of previous research conducted to gather techniques and challenges of building and testing privacy-preserving systems in the context of Natural Language Processing (NLP). We group the different works under four categories: 1) Data privacy in the medical domain, 2) Privacy preservation in the technology domain, 3) Analysis of privacy policies, and 4) Privacy leaks detection in the text representation. This review compares the contributions and pitfalls of the various privacy violation detection and prevention works done using NLP techniques to help guide a path ahead.

Keywords