A classification algorithm using mahalanobis distance clustering of data with applications on biomedical data sets


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Endüstri Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2011

Öğrenci: BAHADIR DURAK

Danışman: CEM İYİGÜN

Özet:

The concept of classification is used and examined by the scientific community for hundreds of years. In this historical process, different methods and algorithms have been developed and used. Today, although the classification algorithms in literature use different methods, they are acting on a similar basis. This basis is setting the desired data into classes by using defined properties, with a different discourse; an effort to establish a relationship between known features with unknown result. This study was intended to bring a different perspective to this common basis. In this study, not only the basic features of data are used, the class of the data is also included as a parameter. The aim of this method is also using the information in the algorithm that come from a known value. In other words, the class, in which the data is included, is evaluated as an input and the data set is transferred to a higher dimensional space which is a new working environment. In this new environment it is not a classification problem anymore, but a clustering problem. Although this logic is similar with Kernel Methods, the methodologies are different from the way that how they transform the working space. In the projected new space, the clusters based on calculations performed with the Mahalanobis Distance are evaluated in original space with two different heuristics which are center-based and KNN-based algorithm. In both heuristics, increase in classification success rates achieved by this methodology. For center based algorithm, which is more sensitive to new input parameter, up to 8% of enhancement is observed.