Computatıonal aspects of discourse annotation


Thesis Type: Postgraduate

Institution Of The Thesis: Middle East Technical University, Turkey

Approval Date: 2008

Thesis Language: English

Student: Berfin Aktaş

Supervisor: HÜSEYİN CEM BOZŞAHİN

Abstract:

In this thesis, we aim to analyze the computational aspects of discourse annotation. Discourse is not only a concatenation of sentences; in fact the totality of discourse is more than the sum total of the sentences that constitute it. The property that differentiates discourse from a set of arbitrary sentences is defined as coherence. Coherence is established by the relations between the parts of discourse. We have a lexicalized approach to discourse, therefore in this study, discourse relations are considered to be set up by lexical items called discourse connectives. Systematic analysis of coherence requires an annotated corpus in which coherence relations are encoded. We developed an annotation environment to be used in an ongoing discourse level annotation project which aims to generate a theory-neutral source of coherence relations. We followed a data-driven methodology in design of the data structure employed in the annotation software. For this reason, we examined the predicate-argument structure of connectives. This analysis shows that stand-off annotation technique is more suitable than an inline method for such an annotation environment. This thesis also include a brief discussion on the formal implications of coherence relation constructions.