Scientific Reports (Jan 2021)

Precise diagnosis of three top cancers using dbGaP data

  • Xu-Qing Liu,
  • Xin-Sheng Liu,
  • Jian-Ying Rong,
  • Feng Gao,
  • Yan-Dong Wu,
  • Chun-Hua Deng,
  • Hong-Yan Jiang,
  • Xiao-Feng Li,
  • Ye-Qin Chen,
  • Zhi-Guo Zhao,
  • Yu-Ting Liu,
  • Hai-Wen Chen,
  • Jun-Liang Li,
  • Yu Huang,
  • Cheng-Yao Ji,
  • Wen-Wen Liu,
  • Xiao-Hu Luo,
  • Li-Li Xiao

DOI
https://doi.org/10.1038/s41598-020-80832-x
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 8

Abstract

Read online

Abstract The challenge of decoding information about complex diseases hidden in huge number of single nucleotide polymorphism (SNP) genotypes is undertaken based on five dbGaP studies. Current genome-wide association studies have successfully identified many high-risk SNPs associated with diseases, but precise diagnostic models for complex diseases by these or more other SNP genotypes are still unavailable in the literature. We report that lung cancer, breast cancer and prostate cancer as the first three top cancers worldwide can be predicted precisely via 240–370 SNPs with accuracy up to 99% according to leave-one-out and 10-fold cross-validation. Our findings (1) confirm an early guess of Dr. Mitchell H. Gail that about 300 SNPs are needed to improve risk forecasts for breast cancer, (2) reveal an incredible fact that SNP genotypes may contain almost all information that one wants to know, and (3) show a hopeful possibility that complex diseases can be precisely diagnosed by means of SNP genotypes without using phenotypical features. In short words, information hidden in SNP genotypes can be extracted in efficient ways to make precise diagnoses for complex diseases.