An integrative approach to structured SNP prioritization and representative snp selection for genome-wide association studies


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Sağlık Bilişimi Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2011

Öğrenci: GÜRKAN ÜSTÜNKAR

Danışman: YEŞİM AYDIN SON

Özet:

Single Nucleotide Polymorphisms (SNPs) are the most frequent genomic variations and the main basis for genetic differences among individuals and many diseases. As genotyping millions of SNPs at once is now possible with the microarrays and advanced sequencing technologies, SNPs are becoming more popular as genomic biomarkers. Like other high-throughput research techniques, genome wide association studies (GWAS) of SNPs usually hit a bottleneck after statistical analysis of significantly associated SNPs, as there is no standardized approach to prioritize SNPs or to select representative SNPs that show association with the conditions under study. In this study, a java based integrated system that makes use of major public databases to prioritize SNPs according to their biological relevance and statistical significance has been constructed. The Analytic Hierarchy Process, has been utilized for objective prioritization of SNPs and a new emerging methodology for second-wave analysis of genes and pathways related to disease associated SNPs based on a combined p-value approach is applied into the prioritization scheme. Using the subset of SNPs that is most representative of all SNPs associated with the diseases reduces the required computational power for analysis and decreases cost of following association and biomarker discovery studies. In addition to the proposed prioritization system, we have developed a novel feature selection method based on Simulated Annealing (SA) for representative SNP selection. The validity and accuracy of developed model has been tested on real life case control data set and produced biologically meaningful results. The integrated desktop application developed in our study will facilitate reliable identification of SNPs that are involved in the etiology of complex diseases, ultimately supporting timely identification of genomic disease biomarkers, and development of personalized medicine approaches and targeted drug discoveries.