State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Tong Deng
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Zhongqi Liufu
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China; Center for Excellence in Animal Evolution and Genetics, The Chinese Academy of Sciences, Kunming, China
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Shijie Wu
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Xueyu Liu
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Changhao Shi
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Bingjie Chen
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China; GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, China
CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Qichun Cai
Cancer Center, Clifford Hospital, Jinan University, Guangzhou, China
Chenli Liu
CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Mengfeng Li
Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
Miles E Tracy
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China; Department of Ecology and Evolution, University of Chicago, Chicago, United States
A central goal of cancer genomics is to identify, in each patient, all the cancer-driving mutations. Among them, point mutations are referred to as cancer-driving nucleotides (CDNs), which recur in cancers. The companion study shows that the probability of i recurrent hits in n patients would decrease exponentially with i; hence, any mutation with i ≥ 3 hits in The Cancer Genome Atlas (TCGA) database is a high-probability CDN. This study characterizes the 50–150 CDNs identifiable for each cancer type of TCGA (while anticipating 10 times more undiscovered ones) as follows: (i) CDNs tend to code for amino acids of divergent chemical properties. (ii) At the genic level, far more CDNs (more than fivefold) fall on noncanonical than canonical cancer-driving genes (CDGs). Most undiscovered CDNs are expected to be on unknown CDGs. (iii) CDNs tend to be more widely shared among cancer types than canonical CDGs, mainly because of the higher resolution at the nucleotide than the whole-gene level. (iv) Most important, among the 50–100 coding region mutations carried by a cancer patient, 5–8 CDNs are expected but only 0–2 CDNs have been identified at present. This low level of identification has hampered functional test and gene-targeted therapy. We show that, by expanding the sample size to 105, most CDNs can be identified. Full CDN identification will then facilitate the design of patient-specific targeting against multiple CDN-harboring genes.