Ontology based information extraction on free text radiological reports using natural language processing approach


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişim Sistemleri Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2010

Öğrenci: ERGİN SOYSAL

Danışman: NAZİFE BAYKAL

Özet:

This thesis describes an information extraction system that is designed to process free text Turkish radiology reports in order to extract and convert the available information into a structured information model. The system uses natural language processing techniques together with domain ontology in order to transform the verbal descriptions into a target information model, so that they can be used for computational purposes. The developed domain ontology is effectively used in entity recognition and relation extraction phases of the information extraction task. The ontology provides the flexibility in the design of extraction rules, and the structure of the ontology also determines the information model that describes the structure of the extracted semantic information. In addition, some of the missing terms in the sentences are identified with the help of the ontology. One of the main contributions of this thesis is the usage of ontology in information extraction that increases the expressive power of extraction rules and helps to determine missing items in the sentences. The system is the first information extraction system for Turkish texts. Since Turkish is a morphologically rich language, the system uses a morphological analyzer and the extraction rules are also based on the morphological features. TRIES achieved 93% recall and 98% precision results in the performance evaluations.