Machine learning and the politics 
of synthetic data

Benjamin N Jacobsen

doi:10.1177/20539517221145372

Big Data & Society (Jan 2023)

Machine learning and the politics of synthetic data

Benjamin N Jacobsen

Affiliations

Benjamin N Jacobsen

DOI: https://doi.org/10.1177/20539517221145372
Journal volume & issue: Vol. 10

Abstract

Read online

Machine-learning algorithms have become deeply embedded in contemporary society. As such, ample attention has been paid to the contents, biases, and underlying assumptions of the training datasets that many algorithmic models are trained on. Yet, what happens when algorithms are trained on data that are not real, but instead data that are ‘synthetic’, not referring to real persons, objects, or events? Increasingly, synthetic data are being incorporated into the training of machine-learning algorithms for use in various societal domains. There is currently little understanding, however, of the role played by and the ethicopolitical implications of synthetic training data for machine-learning algorithms. In this article, I explore the politics of synthetic data through two central aspects: first, synthetic data promise to emerge as a rich source of exposure to variability for the algorithm. Second, the paper explores how synthetic data promise to place algorithms beyond the realm of risk. I propose that an analysis of these two areas will help us better understand the ways in which machine-learning algorithms are envisioned in the light of synthetic data, but also how synthetic training data actively reconfigure the conditions of possibility for machine learning in contemporary society.

Published in Big Data & Society

ISSN: 2053-9517 (Online)
Publisher: SAGE Publishing
Country of publisher: United States
LCC subjects: General Works
Website: https://journals.sagepub.com/home/bds

About the journal