A framework for ranking and categorizing medical documents


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişim Sistemleri Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2010

Öğrenci: MOHAMMED GH. I. AL ZAMİL

Eş Danışman: AYSU BETİN CAN, NAZİFE BAYKAL

Özet:

In this dissertation, we present a framework to enhance the retrieval, ranking, and categorization of text documents in medical domain. The contributions of this study are the introduction of a similarity model to retrieve and rank medical textdocuments and the introduction of rule-based categorization method based on lexical syntactic patterns features. We formulate the similarity model by combining three features to model the relationship among document and construct a document network. We aim to rank retrieved documents according to their topics; making highly relevant document on the top of the hit-list. We have applied this model on OHSUMED collection (TREC-9) in order to demonstrate the performance effectiveness in terms of topical ranking, recall, and precision metrics. In addition, we introduce ROLEX-SP (Rules Of LEXical Syntactic Patterns); a method for the automatic induction of rule-based text-classifiers relies on lexical syntactic patterns as a set of features to categorize text-documents. The proposed method is dedicated to solve the problem of multi-class classification and feature imbalance problems in domain specific text documents. Furthermore, our proposed method is able to categorize documents according to a predefined set of characteristics such as: user-specific, domain-specific, and query-based categorization which facilitates browsing documents in search-engines and increase users ability to choose among relevant documents. To demonstrate the applicability of ROLEX-SP, we have performed experiments on OHSUMED (categorization collection). The results indicate that ROLEX-SP outperforms state-of-the-art methods in categorizing short-text medical documents.