Determination of the effect of polyadenylation SLR values on microarray data classification


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2014

Öğrenci: ÜMİT ASLAN

Danışman: TOLGA CAN

Özet:

Microarray data classification is generally used to predict unknown sample outcomes by the help of models created using the preprocessed and categorized microarray data that includes gene expression values. Preparation of microarray experiments, design of Affymetrix chips and availability of previous microarray experiments give the opportunity to extract a new kind of data; differential expressions of proximal and distal probes (Short to Long Ratio -SLR- values), which is used to predict the alternative polyadenylation (APA) events. In this thesis, we aim to integrate gene expression data and these SLR values and then determine how the microarray data classification is affected after this integration process. Because of the filtering operations applied while predicting the APA events, SLR values are not available for all the probe sets on a microarray sample. These missing values are not left out not only while integrating the data, but also while applying the classification techniques. Three types of classification techniques, Support Vector Machines (SVM), Decision Tree (J48) and Random Forest are applied to primary breast tumor microarray data before and after integration of gene expression values with SLR values and the classification accuracies of metastasis are found out. The results show that; APA events have incontrovertible impact on gene expression classifications and mostly towards improvement of accuracies.