PLoS ONE (Jan 2021)
Extreme value theory as a framework for understanding mutation frequency distribution in cancer genomes.
Abstract
Currently, the population dynamics of preclonal cancer cells before clonal expansion of tumors has not been sufficiently addressed thus far. By focusing on preclonal cancer cell population as a Darwinian evolutionary system, we formulated and analyzed the observed mutation frequency among tumors (MFaT) as a proxy for the hypothesized sequence read frequency and beneficial fitness effect of a cancer driver mutation. Analogous to intestinal crypts, we assumed that sample donor patients are separate culture tanks where proliferating cells follow certain population dynamics described by extreme value theory (EVT). To validate this, we analyzed three large-scale cancer genome datasets, each harboring > 10000 tumor samples and in total involving > 177898 observed mutation sites. We clarified the necessary premises for the application of EVT in the strong selection and weak mutation (SSWM) regime in relation to cancer genome sequences at scale. We also confirmed that the stochastic distribution of MFaT is likely of the Fréchet type, which challenges the well-known Gumbel hypothesis of beneficial fitness effects. Based on statistical data analysis, we demonstrated the potential of EVT as a population genetics framework to understand and explain the stochastic behavior of driver-mutation frequency in cancer genomes as well as its applicability in real cancer genome sequence data.