A Prostate Cancer Model Build by a Novel SVM-ID3 Hybrid Feature Selection Method Using Both Genotyping and Phenotype Data from dbGaP

Creative Commons License

Yucebas S. C., Aydin Son Y.

PLOS ONE, vol.9, 2014 (SCI-Expanded) identifier identifier identifier

  • Publication Type: Article / Article
  • Volume: 9
  • Publication Date: 2014
  • Doi Number: 10.1371/journal.pone.0091404
  • Journal Name: PLOS ONE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Middle East Technical University Affiliated: Yes


Through Genome Wide Association Studies (GWAS) many Single Nucleotide Polymorphism (SNP)-complex disease relations can be investigated. The output of GWAS can be high in amount and high dimensional, also relations between SNPs, phenotypes and diseases are most likely to be nonlinear. In order to handle high volume-high dimensional data and to be able to find the nonlinear relations we have utilized data mining approaches and a hybrid feature selection model of support vector machine and decision tree has been designed. The designed model is tested on prostate cancer data and for the first time combined genotype and phenotype information is used to increase the diagnostic performance. We were able to select phenotypic features such as ethnicity and body mass index, and SNPs those map to specific genes such as CRR9, TERT. The performance results of the proposed hybrid model, on prostate cancer dataset, with 90.92% of sensitivity and 0.91 of area under ROC curve, shows the potential of the approach for prediction and early detection of the prostate cancer.