A study on alternative lexicalizations in Turkish discourse bank


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişsel Bilimler Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2015

Öğrenci: FİKRET GÜNAY

Danışman: DENİZ ZEYREK BOZŞAHİN

Özet:

Discourse relations connect two pieces of discourse and represent a relationship between these two arguments. Discourse relations can be expressed both explicitly and implicitly. The objective of the present thesis is to identify alternative lexicalizations (ALTLEXs) in Turkish (which is a type of implicit relations) in Turkish Discourse Bank, or TDB by means of a corpus-based approach. The thesis contributes to our understanding of Turkish discourse by revealing a set of ALTLEXs. Three methods are employed: a) An annotation process of ALTLEXs is undertaken in TDB. In this procedure, first, 10% of the entire TDB (20 files, approximately 20000 words) are doubly annotated; then, the discovered ALTLEXs are searched and annotated in the entire TDB. Inter-annotator agreement (IAA) is calculated to check the reliability of annotations. b) A lexico-syntactic classification of Turkish ALTLEXs is done, where the ALTLEXs are classified into three groups; i.e. the closed class, the partially open class, and the open ended category. c) Since the open-ended category had too few instances, an automatic extraction method is developed to extract more possible open-ended ALTLEXs. Using all these methods, the thesis finds a total of 94 types (297 tokens) of ALTLEXs in Turkish. This set of ALTLEXs will contribute to the enrichment of TDB with more annotations and help pave the way to new research.