Data Science Journal (Feb 2022)

Guidelines for Publicly Archiving Terrestrial Model Data to Enhance Usability, Intercomparison, and Synthesis

  • Maegen B. Simmonds,
  • William J. Riley,
  • Deborah A. Agarwal,
  • Xingyuan Chen,
  • Shreyas Cholia,
  • Robert Crystal-Ornelas,
  • Ethan T. Coon,
  • Dipankar Dwivedi,
  • Valerie C. Hendrix,
  • Maoyi Huang,
  • Ahmad Jan,
  • Zarine Kakalia,
  • Jitendra Kumar,
  • Charles D. Koven,
  • Li Li,
  • Mario Melara,
  • Lavanya Ramakrishnan,
  • Daniel M. Ricciuto,
  • Anthony P. Walker,
  • Wei Zhi,
  • Qing Zhu,
  • Charuleka Varadharajan

DOI
https://doi.org/10.5334/dsj-2022-003
Journal volume & issue
Vol. 21, no. 1

Abstract

Read online

Scientific communities are increasingly publishing data to evaluate, accredit, and build on published research. However, guidelines for curating data for publication are sparse for model-related research, limiting the usability of archived simulation data. In particular, there are no established guidelines for archiving data related to terrestrial models that simulate land processes and their coupled interactions with climate. Terrestrial modelers have a unique set of challenges when publishing data due to the diversity of scientific domains, research questions, and the types and scales of simulations. Researchers in the U.S. Department of Energy’s (DOE) projects use a variety of multiscale models to advance robust predictions of terrestrial and subsurface ecosystem processes. Here, we synthesize archiving needs for data associated with different DOE models, and provide guidelines for publishing terrestrial model data components following FAIR (Findable, Accessible, Interoperable, Reusable) principles. The guidelines recommend archiving model inputs and testing data used in final simulation runs along with associated codes, workflow scripts, and metadata in public repositories. Researchers should consider archiving model outputs if they are within the storage limits of the repository. We also provide considerations for how to bundle files into different data publications with citable digital object identifiers. Finally, we identify repository features and tools that would enable storage and reuse of model data. Given the diversity of DOE terrestrial models, these guidelines are transferable to other model types and will enable efficient reuse of simulation data for purposes such as model intercomparisons, initialization, benchmarking, synthesis, and comparisons with field observations.

Keywords