Tezin Türü: Doktora
Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişsel Bilimler Anabilim Dalı, Türkiye
Tezin Onay Tarihi: 2015
Özet:This thesis presents a methodology for an overall assessment of the Turkish Discourse Bank (TDB), a linguistic resource where discourse relations overtly expressed by discourse connectives have been identified and annotated with the two arguments they relate. We provide a quantitative and qualitative assessment of the TDB in order to establish the reliability of this discourse resource for Turkish and suggest that our methodology can be utilized for reliability evaluations of other annotated corpora. Our quantitative evaluation consists of calculating in depth statistical measures using the Kappa statistic and extra evaluators originally used in evaluating information retrieval systems. A two-way methodology for calculating the agreement statistics is proposed: a Common Arguments approach and an Overall approach. Although the Overall approach is e ective on its own, we propose a comparison of these two approaches, which enables to pin point sources of disagreements more accurately. As part of our qualitative evaluation we present a novel effort to automatically identify discursive uses of phrasal expressions that have been annotated systematically alongside explicit discourse connectives in the TDB, given any Turkish text. Our cascaded model, achieves full recall, provides 99.95% accuracy, and can be utilized to effortlessly enlarge the coverage of the TDB.