BMC Bioinformatics (Jul 2023)

Starvar: symptom-based tool for automatic ranking of variants using evidence from literature and genomes

  • Șenay Kafkas,
  • Marwa Abdelhakim,
  • Mahmut Uludag,
  • Azza Althagafi,
  • Malak Alghamdi,
  • Robert Hoehndorf

DOI
https://doi.org/10.1186/s12859-023-05406-w
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Identifying variants associated with diseases is a challenging task in medical genetics research. Current studies that prioritize variants within individual genomes generally rely on known variants, evidence from literature and genomes, and patient symptoms and clinical signs. The functionalities of the existing tools, which rank variants based on given patient symptoms and clinical signs, are restricted to the coverage of ontologies such as the Human Phenotype Ontology (HPO). However, most clinicians do not limit themselves to HPO while describing patient symptoms/signs and their associated variants/genes. There is thus a need for an automated tool that can prioritize variants based on freely expressed patient symptoms and clinical signs. Results STARVar is a Symptom-based Tool for Automatic Ranking of Variants using evidence from literature and genomes. STARVar uses patient symptoms and clinical signs, either linked to HPO or expressed in free text format. It returns a ranked list of variants based on a combined score from two classifiers utilizing evidence from genomics and literature. STARVar improves over related tools on a set of synthetic patients. In addition, we demonstrated its distinct contribution to the domain on another synthetic dataset covering publicly available clinical genotype–phenotype associations by using symptoms and clinical signs expressed in free text format. Conclusions STARVar stands as a unique and efficient tool that has the advantage of ranking variants with flexibly expressed patient symptoms in free-form text. Therefore, STARVar can be easily integrated into bioinformatics workflows designed to analyze disease-associated genomes. Availability STARVar is freely available from https://github.com/bio-ontology-research-group/STARVar .

Keywords