Scientific Reports (Dec 2021)

Trellis for efficient data and task management in the VA Million Veteran Program

  • Paul Billing Ross,
  • Jina Song,
  • Philip S. Tsao,
  • Cuiping Pan

DOI
https://doi.org/10.1038/s41598-021-02569-5
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Biomedical studies have become larger in size and yielded large quantities of data, yet efficient data processing remains a challenge. Here we present Trellis, a cloud-based data and task management framework that completely automates the process from data ingestion to result presentation, while tracking data lineage, facilitating information query, and supporting fault-tolerance and scalability. Using a graph database to coordinate the state of the data processing workflows and a scalable microservice architecture to perform bioinformatics tasks, Trellis has enabled efficient variant calling on 100,000 human genomes collected in the VA Million Veteran Program.