Parallel CLOSET+ algorithm for finding frequent closed itemsets


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2009

Öğrenci: TAYFUN ŞEN

Eş Danışman: İSMAİL HAKKI TOROSLU, CEVAT ŞENER

Özet:

Data mining is proving itself to be a very important fi eld as the data available is increasing exponentially, thanks to first computerization and now internetization. On the other hand, cluster computing systems made up of commodity hardware are becoming widespread, along with the multicore processor architectures. This high computing power is synthesized with data mining to process huge amounts of data and to reach information and knowledge. Frequent itemset mining is a special subtopic of data mining because it is an integral part of many types of data mining tasks. Often this task is a prerequisite for many other data mining algorithms, most notably algorithms in the association rule mining area. For this reason, it is studied heavily in the literature. In this thesis, a parallel implementation of CLOSET+, a frequent closed itemset mining algorithm, is presented. The CLOSET+ algorithm has been modi fied to run on multiple processors simultaneously, in order to obtain results faster. Open MPI and Boost libraries have been used for the communication between different processes and the program has been tested on different inputs and parameters. Experimental results show that the algorithm exhibits high speedup and e ficiency for dense data when the support value is higher than a determined value. Proposed parallel algorithm could prove to be useful for application areas where fast response is needed for low to medium number of frequent closed itemsets. A particular application area is the Web where online applications have similar requirements.