Discover Artificial Intelligence (Dec 2021)

Synthetic data use: exploring use cases to optimise data utility

  • Stefanie James,
  • Chris Harbron,
  • Janice Branson,
  • Mimmi Sundler

DOI
https://doi.org/10.1007/s44163-021-00016-y
Journal volume & issue
Vol. 1, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Synthetic data is a rapidly evolving field with growing interest from multiple industry stakeholders and European bodies. In particular, the pharmaceutical industry is starting to realise the value of synthetic data which is being utilised more prevalently as a method to optimise data utility and sharing, ultimately as an innovative response to the growing demand for improved privacy. Synthetic data is data generated by simulation, based upon and mirroring properties of an original dataset. Here, with supporting viewpoints from across the pharmaceutical industry, we set out to explore use cases for synthetic data across seven key but relatable areas for optimising data utility for improved data privacy and protection. We also discuss the various methods which can be used to produce a synthetic dataset and availability of metrics to ensure robust quality of generated synthetic datasets. Lastly, we discuss the potential merits, challenges and future direction of synthetic data within the pharmaceutical industry and the considerations for this privacy enhancing technology.

Keywords