Person name recognition in Turkish financial texts by using local grammar approach


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Enformatik Enstitüsü, Bilişim Sistemleri Anabilim Dalı, Türkiye

Tezin Onay Tarihi: 2007

Öğrenci: ÖZKAN BAYRAKTAR

Eş Danışman: TUĞBA TAŞKAYA TEMİZEL, NAZİFE BAYKAL

Özet:

Named entity recognition (NER) is the task of identifying the named entities (NEs) in the texts and classifying them into semantic categories such as person, organization, and place names and time, date, monetary, and percent expressions. NER has two principal aims: identification of NEs and classification of them into semantic categories. The local grammar (LG) approach has recently been shown to be superior to other NER techniques such as the probabilistic approach, the symbolic approach, and the hybrid approach in terms of being able to work with untagged corpora. The LG approach does not require using any dictionaries and gazetteers, which are lists of proper nouns (PNs) used in NER applications, unlike most of the other NER systems. As a consequence, it is able to recognize NEs in previously unseen texts at minimal costs. Most of the NER systems are costly due to manual rule compilation especially in large tagged corpora. They also require some semantic and syntactic analyses to be applied before pattern generation process, which can be avoided by using the LG approach. In this thesis, we tried to acquire LGs for person names from a large untagged Turkish financial news corpus by using an approach successfully applied to a Reuter’s financial English news corpus recently by H. N. Traboulsi. We explored its applicability to Turkish language by using frequency, collocation, and concordance analyses. In addition, we constructed a list of Turkish reporting verbs. It is an important part of this study because there is no major study about reporting verbs in Turkish.