Searching documents with semantically related keyphrases


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2010

Öğrenci: İBRAHİM AYGÜL

Danışman: FEHİME NİHAN ÇİÇEKLİ

Özet:

In this thesis, we developed SemKPSearch which is a tool for searching documents by the keyphrases that are semantically related with the given query phrase. By relating the keyphrases semantically, we aim to provide users an extended search and browsing capability over a document collection and to increase the number of related results returned for a keyphrase query. Keyphrases provide a brief summary of the content of documents. They can be either author assigned or automatically extracted from the documents. SemKPSearch uses SemKPIndexes which are generated with the keyphrases of the documents. SemKPIndex is a keyphrase index extended with a keyphrase to keyphrase index which stores the semantic relation score between the keyphrases in the document collection. Semantic relation score between keyphrases is calculated using a metric which considers the similarity score between words of the keyphrases. The semantic similarity score between two words is determined with the help of two word-to-word semantic similarity metrics, namely the metric of Wu&Palmer and the metric of Li et al. SemKPSearch is evaluated by the human evaluators which are all computer engineers. For the evaluation, in addition to the author assigned keyphrases, the keyphrases automatically extracted by employing the state-of-the-art algorithm KEA are used to create keyphrase indexes.