Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique

Ilya Tyagin; Ilya Safro

doi:10.1186/s12859-024-05812-8

BMC Bioinformatics (Jun 2024)

Dyport: dynamic importance-based biomedical hypothesis generation benchmarking technique

Ilya Tyagin,
Ilya Safro

Affiliations

Ilya Tyagin: Center for Bioinformatics and Computational Biology, University of Delaware
Ilya Safro: Department of Computer and Information Sciences, University of Delaware

DOI: https://doi.org/10.1186/s12859-024-05812-8
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 28

Abstract

Read online

Abstract Background Automated hypothesis generation (HG) focuses on uncovering hidden connections within the extensive information that is publicly available. This domain has become increasingly popular, thanks to modern machine learning algorithms. However, the automated evaluation of HG systems is still an open problem, especially on a larger scale. Results This paper presents a novel benchmarking framework Dyport for evaluating biomedical hypothesis generation systems. Utilizing curated datasets, our approach tests these systems under realistic conditions, enhancing the relevance of our evaluations. We integrate knowledge from the curated databases into a dynamic graph, accompanied by a method to quantify discovery importance. This not only assesses hypotheses accuracy but also their potential impact in biomedical research which significantly extends traditional link prediction benchmarks. Applicability of our benchmarking process is demonstrated on several link prediction systems applied on biomedical semantic knowledge graphs. Being flexible, our benchmarking system is designed for broad application in hypothesis generation quality verification, aiming to expand the scope of scientific discovery within the biomedical research community. Conclusions Dyport is an open-source benchmarking framework designed for biomedical hypothesis generation systems evaluation, which takes into account knowledge dynamics, semantics and impact. All code and datasets are available at: https://github.com/IlyaTyagin/Dyport .

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords