New heuristics for performance improvement of ilp-based concept discovery systems


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2015

Öğrenci: ABDULLAH DOĞAN

Danışman: PINAR KARAGÖZ

Özet:

A large amount of the valuable data in daily life is stored in relational databases. The accumulation of so much information motivates the need for extracting valuable patterns in relational databases. Background knowledge and a set of target examples that are stored in multiple tables are used to produce hypothesis for ILP-based concept discovery systems. Multiple arguments on these multiple tables end up large search spaces while building the hypothesis that arise computational efficiency problems. In this thesis we focus on concept discovery systems that use Apriori-based specialization operator and work directly on relational tables. Time efficiency of these ILP systems is directly proportional to the number of queries running on DBMS. These queries mostly involve support and confidence calculation queries of candidate concept rules generated on the search space. We aim to increase time efficiency by reducing the number of running queries on these systems. Particularly, we worked on Concept Rule Induction System (CRIS), which uses Aprioribased specialization in hypothesis construction. The methods we propose generate the same solutions as in CRIS. Therefore, we improve the efficiency without affecting the accuracy negatively. In the first method, we prune the concept descriptors using support coverage sets. These sets are stored for memoization support of CRIS. We use the existing sets in our proposed method so that they are also used for pruning the search space. In the second pruning method, we build cosine similarity matrix of attributes of each predicate in pre-processing step. During the specialization of concept descriptors, we prune the search space by utilizing this similarity matrix. Finally we examine the applicability of using NoSQL system MongoDB and a NewSQL system VoltDB as a storage for ILP system CRIS.