Tezin Türü: Doktora
Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Endüstri Mühendisliği Bölümü, Türkiye
Tezin Onay Tarihi: 2011
Öğrenci: TÜLİN İNKAYA
Danışman: NUR EVİN ÖZDEMİREL
Özet:In this dissertation, we consider the clustering problem in data sets with unknown number of clusters having arbitrary shapes, intracluster and intercluster density variations. We introduce a clustering methodology which is composed of three methods that ensures extraction of local density and connectivity properties, data set reduction, and clustering. The first method constructs a unique neighborhood for each data point using the connectivity and density relations among the points based upon the graph theoretical concepts, mainly Gabriel Graphs. Neighborhoods subsequently connected form subclusters (closures) which constitute the skeleton of the clusters. In the second method, the external shape concept in computational geometry is adapted for data set reduction and cluster visualization. This method extracts the external shape of a non-convex n-dimensional data set using Delaunay triangulation. In the third method, we inquire the applicability of Swarm Intelligence to clustering using Ant Colony Optimization (ACO). Ants explore the data set so that the clusters are detected using density break-offs, connectivity and distance information. The proposed ACO-based algorithm uses the outputs of the neighborhood construction (NC) and the external shape formation. In addition, we propose a three-phase clustering algorithm that consists of NC, outlier detection and merging phases. We test the strengths and the weaknesses of the proposed approaches by extensive experimentation with data sets borrowed from literature and generated in a controlled manner. NC is found to be effective for arbitrary shaped clusters, intracluster and intercluster density variations. The external shape formation algorithm achieves significant reductions for convex clusters. The ACO-based and the three-phase clustering algorithms have promising results for the data sets having well-separated clusters.