The VAAST Variant Prioritizer (VVP): ultrafast, easy to use whole genome variant prioritization tool

BMC Bioinformatics. 2018;19(1):1-13 DOI 10.1186/s12859-018-2056-y

 

Journal Homepage

Journal Title: BMC Bioinformatics

ISSN: 1471-2105 (Online)

Publisher: BMC

LCC Subject Category: Medicine: Medicine (General): Computer applications to medicine. Medical informatics | Science: Biology (General)

Country of publisher: United Kingdom

Language of fulltext: English

Full-text formats available: PDF, HTML

 

AUTHORS

Steven Flygare (Department of Human Genetics, University of Utah)
Edgar Javier Hernandez (Department of Human Genetics, University of Utah)
Lon Phan (National Center for Biotechnology Information)
Barry Moore (Department of Human Genetics, University of Utah)
Man Li (Department of Human Genetics, University of Utah)
Anthony Fejes (Fabric Genomics)
Hao Hu (Department of Epidemiology, M.D. Anderson Cancer Center)
Karen Eilbeck (USTAR Center for Genetic Discovery)
Chad Huff (Department of Epidemiology, M.D. Anderson Cancer Center)
Lynn Jorde (Department of Human Genetics, University of Utah)
Martin G. Reese (Fabric Genomics)
Mark Yandell (Department of Human Genetics, University of Utah)

EDITORIAL INFORMATION

Blind peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 19 weeks

 

Abstract | Full Text

Abstract Background Prioritization of sequence variants for diagnosis and discovery of Mendelian diseases is challenging, especially in large collections of whole genome sequences (WGS). Fast, scalable solutions are needed for discovery research, for clinical applications, and for curation of massive public variant repositories such as dbSNP and gnomAD. In response, we have developed VVP, the VAAST Variant Prioritizer. VVP is ultrafast, scales to even the largest variant repositories and genome collections, and its outputs are designed to simplify clinical interpretation of variants of uncertain significance. Results We show that scoring the entire contents of dbSNP (> 155 million variants) requires only 95 min using a machine with 4 cpus and 16 GB of RAM, and that a 60X WGS can be processed in less than 5 min. We also demonstrate that VVP can score variants anywhere in the genome, regardless of type, effect, or location. It does so by integrating sequence conservation, the type of sequence change, allele frequencies, variant burden, and zygosity. Finally, we also show that VVP scores are consistently accurate, and easily interpreted, traits not shared by many commonly used tools such as SIFT and CADD. Conclusions VVP provides rapid and scalable means to prioritize any sequence variant, anywhere in the genome, and its scores are designed to facilitate variant interpretation using ACMG and NHS guidelines. These traits make it well suited for operation on very large collections of WGS sequences.