Scientific Data (Sep 2024)

Digital Microbe: a genome-informed data integration framework for team science on emerging model organisms

  • Iva Veseli,
  • Michelle A. DeMers,
  • Zachary S. Cooper,
  • Matthew S. Schechter,
  • Samuel Miller,
  • Laura Weber,
  • Christa B. Smith,
  • Lidimarie T. Rodriguez,
  • William F. Schroer,
  • Matthew R. McIlvin,
  • Paloma Z. Lopez,
  • Makoto Saito,
  • Sonya Dyhrman,
  • A. Murat Eren,
  • Mary Ann Moran,
  • Rogier Braakman

DOI
https://doi.org/10.1038/s41597-024-03778-z
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 13

Abstract

Read online

Abstract The remarkable pace of genomic data generation is rapidly transforming our understanding of life at the micron scale. Yet this data stream also creates challenges for team science. A single microbe can have multiple versions of genome architecture, functional gene annotations, and gene identifiers; additionally, the lack of mechanisms for collating and preserving advances in this knowledge raises barriers to community coalescence around shared datasets. “Digital Microbes” are frameworks for interoperable and reproducible collaborative science through open source, community-curated data packages built on a (pan)genomic foundation. Housed within an integrative software environment, Digital Microbes ensure real-time alignment of research efforts for collaborative teams and facilitate novel scientific insights as new layers of data are added. Here we describe two Digital Microbes: 1) the heterotrophic marine bacterium Ruegeria pomeroyi DSS-3 with > 100 transcriptomic datasets from lab and field studies, and 2) the pangenome of the cosmopolitan marine heterotroph Alteromonas containing 339 genomes. Examples demonstrate how an integrated framework collating public (pan)genome-informed data can generate novel and reproducible findings.