Big Data & Society (Jun 2024)

Generating reality and silencing debate: Synthetic data as discursive device

  • Paula Helm,
  • Benjamin Lipp,
  • Roser Pujadas

DOI
https://doi.org/10.1177/20539517241249447
Journal volume & issue
Vol. 11

Abstract

Read online

In addition to tapping data from users’ behavioral surplus, by drawing on generative adversarial networks, data for artificial intelligence is now increasingly being generated through artificial intelligence. With this new method of producing data synthetically, the data economy is not only shifting from “data collection” to “data generation.” Synthetic data is also being employed to address some of the most pressing ethical concerns around artificial intelligence. It thereby comes with the sociotechnical imaginary that social problems can be cut out of artificial intelligence, separating training data from real persons. In response to this technical solutionism, this commentary aims to initiate a critical debate about synthetic data that goes beyond misuse scenarios such as the use of generative adversarial networks to create deep fakes or dark patterns. Instead, on a more general level, we seek to complicate the idea of “solving,” i.e., “closing” and thus “silencing” the ethico-political debates for which synthetic data is supposed to be a solution by showing how synthetic data itself is political. Drawing on the complex connections between recent uses of synthetic data and public debates about artificial intelligence, we therefore propose to consider and analyze synthetic data not only as a technical device but as a discursive one as well. To this end, we shed light on their relationship to three pillars that we see associated with them (a) algorithmic bias, (b) privacy, (c) platform economy.