Cell Genomics (Jan 2022)
Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space
- Michael C. Schatz,
- Anthony A. Philippakis,
- Enis Afgan,
- Eric Banks,
- Vincent J. Carey,
- Robert J. Carroll,
- Alessandro Culotti,
- Kyle Ellrott,
- Jeremy Goecks,
- Robert L. Grossman,
- Ira M. Hall,
- Kasper D. Hansen,
- Jonathan Lawson,
- Jeffrey T. Leek,
- Anne O’Donnell Luria,
- Stephen Mosher,
- Martin Morgan,
- Anton Nekrutenko,
- Brian D. O’Connor,
- Kevin Osborn,
- Benedict Paten,
- Candace Patterson,
- Frederick J. Tan,
- Casey Overby Taylor,
- Jennifer Vessio,
- Levi Waldron,
- Ting Wang,
- Kristin Wuichet,
- Alexander Baumann,
- Andrew Rula,
- Anton Kovalsy,
- Clare Bernard,
- Derek Caetano-Anollés,
- Geraldine A. Van der Auwera,
- Justin Canas,
- Kaan Yuksel,
- Kate Herman,
- M. Morgan Taylor,
- Marianie Simeon,
- Michael Baumann,
- Qi Wang,
- Robert Title,
- Ruchi Munshi,
- Sushma Chaluvadi,
- Valerie Reeves,
- William Disman,
- Salin Thomas,
- Allie Hajian,
- Elizabeth Kiernan,
- Namrata Gupta,
- Trish Vosburg,
- Ludwig Geistlinger,
- Marcel Ramos,
- Sehyun Oh,
- Dave Rogers,
- Frances McDade,
- Mim Hastie,
- Nitesh Turaga,
- Alexander Ostrovsky,
- Alexandru Mahmoud,
- Dannon Baker,
- Dave Clements,
- Katherine E.L. Cox,
- Keith Suderman,
- Nataliya Kucher,
- Sergey Golitsynskiy,
- Samantha Zarate,
- Sarah J. Wheelan,
- Kai Kammers,
- Ana Stevens,
- Carolyn Hutter,
- Christopher Wellington,
- Elena M. Ghanaim,
- Ken L. Wiley, Jr.,
- Shurjo K. Sen,
- Valentina Di Francesco,
- Deni s Yuen,
- Brian Walsh,
- Luke Sargent,
- Vahid Jalili,
- John Chilton,
- Lori Shepherd,
- B.J. Stubbs,
- Ash O’Farrell,
- Benton A. Vizzier, Jr.,
- Charles Overbeck,
- Charles Reid,
- David Charles Steinberg,
- Elizabeth A. Sheets,
- Julian Lucas,
- Lon Blauvelt,
- Louise Cabansay,
- Noah Warren,
- Brian Hannafious,
- Tim Harris,
- Radhika Reddy,
- Eric Torstenson,
- M. Katie Banasiewicz,
- Haley J. Abel,
- Jason Walker
Affiliations
- Michael C. Schatz
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA; Corresponding author
- Anthony A. Philippakis
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Corresponding author
- Enis Afgan
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Eric Banks
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Vincent J. Carey
- Harvard Medical School, Harvard University, Cambridge, MA, USA
- Robert J. Carroll
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Alessandro Culotti
- Broad Institute of MIT and Harvard, Cambridge, MA, USA; Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- Kyle Ellrott
- Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Jeremy Goecks
- Biomedical Engineering, Oregon Health & Science University, Portland, OR, USA
- Robert L. Grossman
- Center for Translational Data Science, University of Chicago, Chicago, IL, USA
- Ira M. Hall
- Yale School of Medicine, Yale University, New Haven, CT, USA
- Kasper D. Hansen
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Jonathan Lawson
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Jeffrey T. Leek
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
- Anne O’Donnell Luria
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Stephen Mosher
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Martin Morgan
- Department of Biostatistics and Bioinformatics, Roswell Park Comprehensive Cancer Center, Buffalo, NY, USA
- Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA, USA
- Brian D. O’Connor
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Kevin Osborn
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- Benedict Paten
- UC Santa Cruz Genomics Institute, UC Santa Cruz, Santa Cruz, CA, USA
- Candace Patterson
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Frederick J. Tan
- Department of Embryology, Carnegie Institution, Baltimore, MD, USA
- Casey Overby Taylor
- Departments of Medicine and Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Jennifer Vessio
- Department of Biology, Johns Hopkins University, Baltimore, MD, USA
- Levi Waldron
- Department of Epidemiology and Biostatistics, City University of New York Graduate School of Public Health and Health Policy, New York, NY, USA
- Ting Wang
- Department of Genetics, Washington University of St. Louis, St. Louis, MO, USA
- Kristin Wuichet
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
- Alexander Baumann
- Andrew Rula
- Anton Kovalsy
- Clare Bernard
- Derek Caetano-Anollés
- Geraldine A. Van der Auwera
- Justin Canas
- Kaan Yuksel
- Kate Herman
- M. Morgan Taylor
- Marianie Simeon
- Michael Baumann
- Qi Wang
- Robert Title
- Ruchi Munshi
- Sushma Chaluvadi
- Valerie Reeves
- William Disman
- Salin Thomas
- Allie Hajian
- Elizabeth Kiernan
- Namrata Gupta
- Trish Vosburg
- Ludwig Geistlinger
- Marcel Ramos
- Sehyun Oh
- Dave Rogers
- Frances McDade
- Mim Hastie
- Nitesh Turaga
- Alexander Ostrovsky
- Alexandru Mahmoud
- Dannon Baker
- Dave Clements
- Katherine E.L. Cox
- Keith Suderman
- Nataliya Kucher
- Sergey Golitsynskiy
- Samantha Zarate
- Sarah J. Wheelan
- Kai Kammers
- Ana Stevens
- Carolyn Hutter
- Christopher Wellington
- Elena M. Ghanaim
- Ken L. Wiley, Jr.
- Shurjo K. Sen
- Valentina Di Francesco
- Deni s Yuen
- Brian Walsh
- Luke Sargent
- Vahid Jalili
- John Chilton
- Lori Shepherd
- B.J. Stubbs
- Ash O’Farrell
- Benton A. Vizzier, Jr.
- Charles Overbeck
- Charles Reid
- David Charles Steinberg
- Elizabeth A. Sheets
- Julian Lucas
- Lon Blauvelt
- Louise Cabansay
- Noah Warren
- Brian Hannafious
- Tim Harris
- Radhika Reddy
- Eric Torstenson
- M. Katie Banasiewicz
- Haley J. Abel
- Jason Walker
- Journal volume & issue
-
Vol. 2,
no. 1
p. 100085
Abstract
Summary: The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types.