An open source knowledge graph ecosystem for the life sciences

Tiffany J. Callahan; Ignacio J. Tripodi; Adrianne L. Stefanski; Luca Cappelletti; Sanya B. Taneja; Jordan M. Wyrwa; Elena Casiraghi; Nicolas A. Matentzoglu; Justin Reese; Jonathan C. Silverstein; Charles Tapley Hoyt; Richard D. Boyce; Scott A. Malec; Deepak R. Unni; Marcin P. Joachimiak; Peter N. Robinson; Christopher J. Mungall; Emanuele Cavalleri; Tommaso Fontana; Giorgio Valentini; Marco Mesiti; Lucas A. Gillenwater; Brook Santangelo; Nicole A. Vasilevsky; Robert Hoehndorf; Tellen D. Bennett; Patrick B. Ryan; George Hripcsak; Michael G. Kahn; Michael Bada; William A. Baumgartner; Lawrence E. Hunter

doi:10.1038/s41597-024-03171-w

Scientific Data (Apr 2024)

An open source knowledge graph ecosystem for the life sciences

Tiffany J. Callahan,
Ignacio J. Tripodi,
Adrianne L. Stefanski,
Luca Cappelletti,
Sanya B. Taneja,
Jordan M. Wyrwa,
Elena Casiraghi,
Nicolas A. Matentzoglu,
Justin Reese,
Jonathan C. Silverstein,
Charles Tapley Hoyt,
Richard D. Boyce,
Scott A. Malec,
Deepak R. Unni,
Marcin P. Joachimiak,
Peter N. Robinson,
Christopher J. Mungall,
Emanuele Cavalleri,
Tommaso Fontana,
Giorgio Valentini,
Marco Mesiti,
Lucas A. Gillenwater,
Brook Santangelo,
Nicole A. Vasilevsky,
Robert Hoehndorf,
Tellen D. Bennett,
Patrick B. Ryan,
George Hripcsak,
Michael G. Kahn,
Michael Bada,
William A. Baumgartner,
Lawrence E. Hunter

Affiliations

Tiffany J. Callahan: Computational Bioscience Program, University of Colorado Anschutz Medical Campus
Ignacio J. Tripodi: Computer Science Department, Interdisciplinary Quantitative Biology, University of Colorado Boulder
Adrianne L. Stefanski: Computational Bioscience Program, University of Colorado Anschutz Medical Campus
Luca Cappelletti: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Sanya B. Taneja: Intelligent Systems Program, University of Pittsburgh
Jordan M. Wyrwa: Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus
Elena Casiraghi: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Nicolas A. Matentzoglu: Semanticly
Justin Reese: Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
Jonathan C. Silverstein: Department of Biomedical Informatics, University of Pittsburgh School of Medicine
Charles Tapley Hoyt: Laboratory of Systems Pharmacology, Harvard Medical School
Richard D. Boyce: Department of Biomedical Informatics, University of Pittsburgh School of Medicine
Scott A. Malec: Division of Translational Informatics, University of New Mexico School of Medicine
Deepak R. Unni: SIB Swiss Institute of Bioinformatics
Marcin P. Joachimiak: Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
Peter N. Robinson: Berlin Institute of Health at Charité-Universitatsmedizin
Christopher J. Mungall: Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
Emanuele Cavalleri: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Tommaso Fontana: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Giorgio Valentini: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Marco Mesiti: AnacletoLab, Dipartimento di Informatica, Universit`a degli Studi di Milano
Lucas A. Gillenwater: Computational Bioscience Program, University of Colorado Anschutz Medical Campus
Brook Santangelo: Computational Bioscience Program, University of Colorado Anschutz Medical Campus
Nicole A. Vasilevsky: Data Collaboration Center, Critical Path Institute
Robert Hoehndorf: Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology
Tellen D. Bennett: Department of Biomedical Informatics, University of Colorado School of Medicine
Patrick B. Ryan: Janssen Research and Development
George Hripcsak: Department of Biomedical Informatics, Columbia University Irving Medical Center
Michael G. Kahn: Department of Biomedical Informatics, University of Colorado School of Medicine
Michael Bada: Division of General Internal Medicine, University of Colorado School of Medicine
William A. Baumgartner: Division of General Internal Medicine, University of Colorado School of Medicine
Lawrence E. Hunter: Computational Bioscience Program, University of Colorado Anschutz Medical Campus

DOI: https://doi.org/10.1038/s41597-024-03171-w
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data, but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to construct them automatically. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoint resources and abstraction algorithms), and benchmarks (e.g., prebuilt KGs). We evaluated the ecosystem by systematically comparing it to existing open-source KG construction methods and by analyzing its computational performance when used to construct 12 different large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal