Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen; Jun Li; Yumeng Wang; Patrick Kwok-Shing Ng; Yiu Huen Tsang; Kenna R. Shaw; Gordon B. Mills; Han Liang

doi:10.1186/s13059-020-01954-z

Genome Biology (Feb 2020)

Comprehensive assessment of computational algorithms in predicting cancer driver mutations

Hu Chen,
Jun Li,
Yumeng Wang,
Patrick Kwok-Shing Ng,
Yiu Huen Tsang,
Kenna R. Shaw,
Gordon B. Mills,
Han Liang

Affiliations

Hu Chen: Graduate Program in Quantitative and Computational Biosciences, Baylor College of Medicine
Jun Li: Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center
Yumeng Wang: Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center
Patrick Kwok-Shing Ng: Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center
Yiu Huen Tsang: Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University
Kenna R. Shaw: Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center
Gordon B. Mills: Department of Cell, Developmental & Cancer Biology, Knight Cancer Institute, Oregon Health Sciences University
Han Liang: Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center

DOI: https://doi.org/10.1186/s13059-020-01954-z
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background The initiation and subsequent evolution of cancer are largely driven by a relatively small number of somatic mutations with critical functional impacts, so-called driver mutations. Identifying driver mutations in a patient’s tumor cells is a central task in the era of precision cancer medicine. Over the decade, many computational algorithms have been developed to predict the effects of missense single-nucleotide variants, and they are frequently employed to prioritize mutation candidates. These algorithms employ diverse molecular features to build predictive models, and while some algorithms are cancer-specific, others are not. However, the relative performance of these algorithms has not been rigorously assessed. Results We construct five complementary benchmark datasets: mutation clustering patterns in the protein 3D structures, literature annotation based on OncoKB, TP53 mutations based on their effects on target-gene transactivation, effects of cancer mutations on tumor formation in xenograft experiments, and functional annotation based on in vitro cell viability assays we developed including a new dataset of ~ 200 mutations. We evaluate the performance of 33 algorithms and found that CHASM, CTAT-cancer, DEOGEN2, and PrimateAI show consistently better performance than the other algorithms. Moreover, cancer-specific algorithms show much better performance than those designed for a general purpose. Conclusions Our study is a comprehensive assessment of the performance of different algorithms in predicting cancer driver mutations and provides deep insights into the best practice of computationally prioritizing cancer mutation candidates for end-users and for the future development of new algorithms.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal

Abstract

Keywords