Data mining for regional and graph-structured data objects


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Endüstri Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2019

Tezin Dili: İngilizce

Öğrenci: DERYA DİNLER

Asıl Danışman (Eş Danışmanlı Tezler İçin): Mustafa Kemal Tural

Eş Danışman: Nur Evin Özdemirel

Özet:

Three research problems are addressed in this study. The first one is a semi-supervised clustering problem with instance-level constraints where each data object is either a closed convex bounded polytope or a closed disk. We first model the problem of computing the centroid of a given cluster as a second order cone programming problem. Also, a subgradient algorithm is adopted for its faster solution. We then propose a mixed-integer second order cone programming formulation and six heuristic approaches for the considered clustering problem. Finally, we compare solution approaches in terms of computational time and quality on randomly generated and real life datasets. The second problem deals with mining a single graph to find central group of nodes of the graph. For the identification of central nodes, we utilize the group betweenness centrality (GBC) measure. We propose a method that first computes upper and lower bounds on the GBC of several groups. The method then eliminates groups with upper bounds that are lower than the maximum lower bound obtained to find candidates for the optimal group. Finally, an approximating or the optimal group can be returned to the user. We conduct computational experiments with randomly generated and real life networks to test the performance of the proposed method. The last problem is the clustering of m-ary trees where nodes are unweighted, edges are unweighted or weighted, and node correspondence is known. To measure the distance between two trees, we utilize vertex/edge overlap (VEO) and graph edit distance (GED) measures from the literature. To find a representative (centroid) tree for a given set of trees, we propose exact and heuristic solution approaches. We test our algorithms on randomly generated and real life datasets.