Türkçe Söylem Bankası için işaretçiler arası uyum ölçüm metodolojisi.

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Türkiye

Tezin Onay Tarihi: 2010

Tezin Dili: İngilizce

Öğrenci: Şaban İhsan Yalçınkaya

Danışman: DENİZ ZEYREK BOZŞAHİN

Özet:

In the TDB[1]-like corpora annotation efforts, which are constructed by the intuitions of the annotators, the reliability of the corpus can only be determined via correct interannotator agreement measurement methodology (Artstein, & Poesio, 2008). In this thesis, a methodology was defined to measure the inter-annotator agreement among the TDB annotators. The statistical tests and the agreement coefficients that are widely used in scientific communities, including Cochran’s Q test (1950), Fleiss’ Kappa (1971), and Krippendorff’s Alpha (1995), were examined in detail. The inter-annotator agreement measurement approaches of the various corpus annotation efforts were scrutinized in terms of the reported statistical results. It was seen that none of the reported interannotator agreement approaches were statistically appropriate for the TDB. Therefore, a comprehensive inter-annotator agreement measurement methodology was designed from scratch. A computer program, the Rater Agreement Tool (RAT), was developed in order to perform statistical measurements on the TDB with different corpus parameters and data handling approaches. It was concluded that Krippendorff’s Alpha is the most appropriate statistical method for the TDB. It was seen that the measurements are affected with data handling approach preferences, as well as the used agreement statistic methods. It was also seen that there is not only one correct approach but several approaches valid for different research considerations. For the TDB, the major data handling suggestions that emerged are: (1) considering the words as building blocks of the annotations and (2) using the interval approach when it is preferred to weigh the partial disagreements, and using the boundary approach when it is preferred to evaluate all disagreements in same way.