Scientific Data (May 2024)

A dataset for measuring the impact of research data and their curation

  • Libby Hemphill,
  • Andrea Thomer,
  • Sara Lafia,
  • Lizhou Fan,
  • David Bleckley,
  • Elizabeth Moss

DOI
https://doi.org/10.1038/s41597-024-03303-2
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Science funders, publishers, and data archives make decisions about how to responsibly allocate resources to maximize the reuse potential of research data. This paper introduces a dataset developed to measure the impact of archival and data curation decisions on data reuse. The dataset describes 10,605 social science research datasets, their curation histories, and reuse contexts in 94,755 publications that cover 59 years from 1963 to 2022. The dataset was constructed from study-level metadata, citing publications, and curation records available through the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan. The dataset includes information about study-level attributes (e.g., PIs, funders, subject terms); usage statistics (e.g., downloads, citations); archiving decisions (e.g., curation activities, data transformations); and bibliometric attributes (e.g., journals, authors) for citing publications. This dataset provides information on factors that contribute to long-term data reuse, which can inform the design of effective evidence-based recommendations to support high-impact research data curation decisions.