Scientific Data (Jun 2023)

Signing data citations enables data verification and citation persistence

  • Michael J. Elliott,
  • Jorrit H. Poelen,
  • José A. B. Fortes

DOI
https://doi.org/10.1038/s41597-023-02230-y
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Commonly used data citation practices rely on unverifiable retrieval methods which are susceptible to content drift, which occurs when the data associated with an identifier have been allowed to change. Based on our earlier work on reliable dataset identifiers, we propose signed citations, i.e., customary data citations extended to also include a standards-based, verifiable, unique, and fixed-length digital content signature. We show that content signatures enable independent verification of the cited content and can improve the persistence of the citation. Because content signatures are location- and storage-medium-agnostic, cited data can be copied to new locations to ensure their persistence across current and future storage media and data networks. As a result, content signatures can be leveraged to help scalably store, locate, access, and independently verify content across new and existing data infrastructures. Content signatures can also be embedded inside content to create robust, distributed knowledge graphs that can be cited using a single signed citation. We describe applications of signed citations to solve real-world data collection, identification, and citation challenges.