Viruses (Dec 2020)
NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index
- Joan Martí-Carreras,
- Alejandro Rafael Gener,
- Sierra D. Miller,
- Anderson F. Brito,
- Christiam E. Camacho,
- Ryan Connor,
- Ward Deboutte,
- Cody Glickman,
- David M. Kristensen,
- Wynn K. Meyer,
- Sejal Modha,
- Alexis L. Norris,
- Surya Saha,
- Anna K. Belford,
- Evan Biederstedt,
- James Rodney Brister,
- Jan P. Buchmann,
- Nicholas P. Cooley,
- Robert A. Edwards,
- Kiran Javkar,
- Michael Muchow,
- Harihara Subrahmaniam Muralidharan,
- Charles Pepe-Ranney,
- Nidhi Shah,
- Migun Shakya,
- Michael J. Tisza,
- Benjamin J. Tully,
- Bert Vanmechelen,
- Valerie C. Virta,
- JL Weissman,
- Vadim Zalunin,
- Alexandre Efremov,
- Ben Busby
Affiliations
- Joan Martí-Carreras
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium
- Alejandro Rafael Gener
- Integrative Molecular and Biomedical Sciences Program, Baylor College of Medicine, Houston, TX 77030, USA
- Sierra D. Miller
- Genetics & Molecular Biology, Millersville University, 40 Dilworth Rd, Millersville, PA 17551, USA
- Anderson F. Brito
- Department of Epidemiology of Microbial Diseases, Yale School of Public Health (YSPH), 60 College Street, New Haven, CT 06510, USA
- Christiam E. Camacho
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- Ryan Connor
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- Ward Deboutte
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium
- Cody Glickman
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium
- David M. Kristensen
- Computational Bioscience Program, University of Colorado Anschutz, Aurora, CO 80045, USA
- Wynn K. Meyer
- AAAS Science and Technology Policy Fellow, Office of Data Science Strategy, Division of Program Coordination, Planning, and Strategic Initiatives, Office of the Director, National Institutes of Health, 31 Center Dr., Bethesda, MD 20894, USA
- Sejal Modha
- MRC-University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
- Alexis L. Norris
- Biotechnology Graduate Program, University of Maryland Global Campus, 1616 McCormick Drive, Largo, MD 20774, USA
- Surya Saha
- Boyce Thompson Institute, Ithaca, NY 14850, USA
- Anna K. Belford
- Laboratory of Cellular Oncology, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20894, USA
- Evan Biederstedt
- Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- James Rodney Brister
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- Jan P. Buchmann
- School of Life and Environmental Sciences and School of Medical Sciences, Marie Bashir Institute for Infectious Diseases and Biosecurity, The University of Sydney, Sydney, Australia
- Nicholas P. Cooley
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15260, USA
- Robert A. Edwards
- College of Science and Engineering, Flinders University, Bedford Park, SA 5042, Australia
- Kiran Javkar
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA
- Michael Muchow
- Novel Microdevices, Nucleic Acids, Baltimore, MD 21202, USA
- Harihara Subrahmaniam Muralidharan
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA
- Charles Pepe-Ranney
- AgBiome, 104 TW Alexander, Research Triangle, NC 27709, USA
- Nidhi Shah
- Department of Computer Science, University of Maryland, College Park, MD 20740, USA
- Migun Shakya
- Bioscience Division, Bikini Atoll Road, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
- Michael J. Tisza
- Laboratory of Cellular Oncology, National Cancer Institute, 37 Convent Dr., Bethesda, MD 20894, USA
- Benjamin J. Tully
- Center for Dark Energy Biosphere Investigations, University of Southern California, Los Angeles, CA 90089, USA
- Bert Vanmechelen
- Laboratory of Clinical and Epidemiological Virology, KU Leuven Department of Microbiology, Immunology and Transplantation, Rega Institute, BE3000 Leuven, Belgium
- Valerie C. Virta
- AAAS Science & Technology Policy Fellow, National Institutes of Health, Center for Information Technology, 6555 Rock Spring Drive, Bethesda, MD 20817, USA
- JL Weissman
- Department of Marine and Environmental Biology, University of Southern California, Los Angeles, CA 90089, USA
- Vadim Zalunin
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- Alexandre Efremov
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- Ben Busby
- National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD 20894, USA
- DOI
- https://doi.org/10.3390/v12121424
- Journal volume & issue
-
Vol. 12,
no. 12
p. 1424
Abstract
Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.
Keywords