Thesis Type: Postgraduate
Institution Of The Thesis: Middle East Technical University, Turkey
Approval Date: 2015
Thesis Language: English
Student: Halil Ağın
Supervisor: CENGİZ ACARTÜRK
Open Archive Collection: AVESIS Open Access Collection
Abstract:This thesis takes the distributional semantics (frequency-based semantics) approach as the theoretical framework to quantify textual coherence. Distributional semantics describes discourse sections as vectors, having dimensions are the frequency count of co-occurring words in the text within its semantic space. It quantifies the textual coherence by measuring the cosine values of vectors of successive sentences (cf. Latent Semantic Analysis, LSA). The common assumption underlying LSA based studies is that the frequency of word co-occurrence can be used as a cohesive cue to quantify textual coherence, thus leading to analyses based on a term-document matrix. In this thesis, the spatial distance of co-occurring words is considered as a new frequency event of cohesive cues and introduces a document-distance matrix, which is derived from the term-document matrix. This thesis proposes that the matrix representation of document-distance (a derivation of term-document matrix) of co-occurring words in adjacent sentences in a text can be used to quantify textual coherence. Two mathematical functions are suggested for deriving the document-distance matrix and two algorithms for the operations. The mathematical functions operate on the documentdocument matrix (a derivation of term-document matrix) to derive the documentdistance matrix. The algorithms measure the coherence of text by operating on the newly introduced document-distance matrices.