İngilizceden Türkçeye faktörlü sözcük öbeği tabanlı istatistiksel makine çevirisinde baş sonlandırma ve morfolojik çözümleme.


Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2015

Tezin Dili: İngilizce

Öğrenci: Haydar İmren

Danışman: RUKET ÇAKICI

Özet:

Machine Translation is a field of study which deals with translating text from one natural language to another automatically. Statistical Machine Translation generates the translations using statistical methods and bilingual text corpora. In this study, an approach for translating from English to Turkish is introduced. Turkish is an agglutinative language with a free constituent order, whereas English is not agglutinative and the constituent order is strict. Besides these differences, there is a lack of parallel corpora for this language pair which makes SMT a challenging problem. Up to now, most of the work and research done for this language pair suggest representing the languages at the morpheme-level. The difference of this study is not only representing English and Turkish at morpheme-level but also applying a different reordering technique which was successfully used for other languages, which are grammatically similar to Turkish. The technique is called Head Finalization. To report the results of this study, BLEU metric is used. With improvements in reordering and morpheme-level representation, we have increased our BLEU score from a baseline score of 19.62 to 30.93, which corresponds to an increase of 57%. The experiments can be successfully applied to other languages which are close to Turkish in terms of word order, morphological structure and suffixation.