KG-COVID-19: A Framework to Produce Customized Knowledge Graphs for COVID-19 Response
Justin T. Reese,
Deepak Unni,
Tiffany J. Callahan,
Luca Cappelletti,
Vida Ravanmehr,
Seth Carbon,
Kent A. Shefchek,
Benjamin M. Good,
James P. Balhoff,
Tommaso Fontana,
Hannah Blau,
Nicolas Matentzoglu,
Nomi L. Harris,
Monica C. Munoz-Torres,
Melissa A. Haendel,
Peter N. Robinson,
Marcin P. Joachimiak,
Christopher J. Mungall
Affiliations
Justin T. Reese
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Corresponding author
Deepak Unni
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Tiffany J. Callahan
Computational Bioscience Program, Department of Pharmacology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045, USA
Luca Cappelletti
Department of Computer Science, University of Milano, 20122 Milan, Italy
Vida Ravanmehr
The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Seth Carbon
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Kent A. Shefchek
Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
Benjamin M. Good
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
James P. Balhoff
Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC 27517, USA
Tommaso Fontana
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
Hannah Blau
The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Nicolas Matentzoglu
Independent Semantic Technology Contractor, London, UK
Nomi L. Harris
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Monica C. Munoz-Torres
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA; Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
Melissa A. Haendel
Linus Pauling Institute, Environmental and Molecular Toxicology, Oregon State University, Corvallis, OR 97331, USA
Peter N. Robinson
The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
Marcin P. Joachimiak
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Christopher J. Mungall
Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Summary: Integrated, up-to-date data about SARS-CoV-2 and COVID-19 is crucial for the ongoing response to the COVID-19 pandemic by the biomedical research community. While rich biological knowledge exists for SARS-CoV-2 and related viruses (SARS-CoV, MERS-CoV), integrating this knowledge is difficult and time-consuming, since much of it is in siloed databases or in textual format. Furthermore, the data required by the research community vary drastically for different tasks; the optimal data for a machine learning task, for example, is much different from the data used to populate a browsable user interface for clinicians. To address these challenges, we created KG-COVID-19, a flexible framework that ingests and integrates heterogeneous biomedical data to produce knowledge graphs (KGs), and applied it to create a KG for COVID-19 response. This KG framework also can be applied to other problems in which siloed biomedical data must be quickly integrated for different research applications, including future pandemics. The Bigger Picture: An effective response to the COVID-19 pandemic relies on integration of many different types of data available about SARS-CoV-2 and related viruses. KG-COVID-19 is a framework for producing knowledge graphs that can be customized for downstream applications including machine learning tasks, hypothesis-based querying, and browsable user interface to enable researchers to explore COVID-19 data and discover relationships.