PLoS ONE (Jan 2014)

A prostate cancer model build by a novel SVM-ID3 hybrid feature selection method using both genotyping and phenotype data from dbGaP.

  • Sait Can Yücebaş,
  • Yeşim Aydın Son

DOI
https://doi.org/10.1371/journal.pone.0091404
Journal volume & issue
Vol. 9, no. 3
p. e91404

Abstract

Read online

Through Genome Wide Association Studies (GWAS) many Single Nucleotide Polymorphism (SNP)-complex disease relations can be investigated. The output of GWAS can be high in amount and high dimensional, also relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations we have utilized data mining approaches and a hybrid feature selection model of support vector machine and decision tree has been designed. The designed model is tested on prostate cancer data and for the first time combined genotype and phenotype information is used to increase the diagnostic performance. We were able to select phenotypic features such as ethnicity and body mass index, and SNPs those map to specific genes such as CRR9, TERT. The performance results of the proposed hybrid model, on prostate cancer dataset, with 90.92% of sensitivity and 0.91 of area under ROC curve, shows the potential of the approach for prediction and early detection of the prostate cancer.