Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Khalid Mahmood; Chol-hee Jung; Gayle Philip; Peter Georgeson; Jessica Chung; Bernard J. Pope; Daniel J. Park

doi:10.1186/s40246-017-0104-8

Human Genomics (May 2017)

Variant effect prediction tools assessed using independent, functional assay-based datasets: implications for discovery and diagnostics

Khalid Mahmood,
Chol-hee Jung,
Gayle Philip,
Peter Georgeson,
Jessica Chung,
Bernard J. Pope,
Daniel J. Park

Affiliations

Khalid Mahmood: Melbourne Bioinformatics, The University of Melbourne
Chol-hee Jung: Melbourne Bioinformatics, The University of Melbourne
Gayle Philip: Melbourne Bioinformatics, The University of Melbourne
Peter Georgeson: Melbourne Bioinformatics, The University of Melbourne
Jessica Chung: Melbourne Bioinformatics, The University of Melbourne
Bernard J. Pope: Melbourne Bioinformatics, The University of Melbourne
Daniel J. Park: Melbourne Bioinformatics, The University of Melbourne

DOI: https://doi.org/10.1186/s40246-017-0104-8
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Background Genetic variant effect prediction algorithms are used extensively in clinical genomics and research to determine the likely consequences of amino acid substitutions on protein function. It is vital that we better understand their accuracies and limitations because published performance metrics are confounded by serious problems of circularity and error propagation. Here, we derive three independent, functionally determined human mutation datasets, UniFun, BRCA1-DMS and TP53-TA, and employ them, alongside previously described datasets, to assess the pre-eminent variant effect prediction tools. Results Apparent accuracies of variant effect prediction tools were influenced significantly by the benchmarking dataset. Benchmarking with the assay-determined datasets UniFun and BRCA1-DMS yielded areas under the receiver operating characteristic curves in the modest ranges of 0.52 to 0.63 and 0.54 to 0.75, respectively, considerably lower than observed for other, potentially more conflicted datasets. Conclusions These results raise concerns about how such algorithms should be employed, particularly in a clinical setting. Contemporary variant effect prediction tools are unlikely to be as accurate at the general prediction of functional impacts on proteins as reported prior. Use of functional assay-based datasets that avoid prior dependencies promises to be valuable for the ongoing development and accurate benchmarking of such tools.

Published in Human Genomics

ISSN: 1479-7364 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine; Science: Biology (General): Genetics
Website: https://humgenomics.biomedcentral.com/

About the journal

Abstract

Keywords