Journal of Cultural Analytics (Mar 2021)

Images of the arXiv: Reconfiguring large scientific image datasets

  • Kynan Tan,
  • Anna Munster,
  • Adrian Mackenzie

Journal volume & issue
Vol. 6, no. 1

Abstract

Read online

In an ongoing research project on the ascendancy of statistical visual forms, we have been concerned with the transformations wrought by such images and their organisation as datasets in ‘re-drawing’ knowledge about empirical phenomena. Historians and science studies researchers have long established the generative rather than simply illustrative role of images and figures within scientific practice. More recently, the deployment and generation of images by scientific research and its communication via publication has been impacted by the tools, techniques, and practices of working with large (image) datasets. Against this background, we built a dataset of 10 million-plus images drawn from all preprint articles deposited in the open access repository arXiv from 1991 (its inception) until the end of 2018. In this article, we suggest ways – including algorithms drawn from machine learning that facilitate visually ’slicing’ through the image data and metadata – for exploring large datasets of statistical scientific images. By treating all forms of visual material found in scientific publications – whether diagrams, photographs, or instrument data – as bare images, we developed methods for tracking their movements across a range of scientific research. We suggest that such methods allow us different entry points into large scientific image datasets and that they initiate a new set of questions about how scientific representation might be operating at more-than-human scale.