Frontiers in Cellular and Infection Microbiology (May 2024)

FAIR compliant database development for human microbiome data samples

  • Mathieu Dorst,
  • Nathan Zeevenhooven,
  • Rory Wilding,
  • Daniel Mende,
  • Bernd W. Brandt,
  • Egija Zaura,
  • Alfons Hoekstra,
  • Vivek M. Sheraton

DOI
https://doi.org/10.3389/fcimb.2024.1384809
Journal volume & issue
Vol. 14

Abstract

Read online

IntroductionSharing microbiome data among researchers fosters new innovations and reduces cost for research. Practically, this means that the (meta)data will have to be standardized, transparent and readily available for researchers. The microbiome data and associated metadata will then be described with regards to composition and origin, in order to maximize the possibilities for application in various contexts of research. Here, we propose a set of tools and protocols to develop a real-time FAIR (Findable. Accessible, Interoperable and Reusable) compliant database for the handling and storage of human microbiome and host-associated data.MethodsThe conflicts arising from privacy laws with respect to metadata, possible human genome sequences in the metagenome shotgun data and FAIR implementations are discussed. Alternate pathways for achieving compliance in such conflicts are analyzed. Sample traceable and sensitive microbiome data, such as DNA sequences or geolocalized metadata are identified, and the role of the GDPR (General Data Protection Regulation) data regulations are considered. For the construction of the database, procedures have been realized to make data FAIR compliant, while preserving privacy of the participants providing the data.Results and discussionAn open-source development platform, Supabase, was used to implement the microbiome database. Researchers can deploy this real-time database to access, upload, download and interact with human microbiome data in a FAIR complaint manner. In addition, a large language model (LLM) powered by ChatGPT is developed and deployed to enable knowledge dissemination and non-expert usage of the database.

Keywords