A turkish database for psycholinguistic studies: A corpus based study on frequency, age of acquisition, and imageability


Thesis Type: Postgraduate

Institution Of The Thesis: Orta Doğu Teknik Üniversitesi, Graduate School of Informatics, Cognitive Science, Turkey

Approval Date: 2015

Student: ELİF AHSEN TOLGAY

Co-Supervisor: HÜSEYİN CEM BOZŞAHİN, DENİZ ZEYREK BOZŞAHİN

Abstract:

Psycholinguistic databases are reliable and practical sources for research purposes, since they provide standardized stimuli for scientific studies. The objective of the present thesis is to initiate a Turkish psycholinguistic database. Three variables are included in addition to quantitative variables (number of letters etc.): frequency, age-ofacquisition (AoA), and imageability. Frequency values are extracted from two sources; a child literature corpus (CLC) that is created for the purposes of the current thesis, and a web based corpus that represents adult language use (BOUN Corpus). Imageability ratings are collected from adult population with a questionnaire. The main research in the thesis is to compare two methods to obtain AoA values: to collect rated AoA with a questionnaire conducted on adult population, and to compare frequencies from adult and child language corpora. First, the frequency counts from CLC are compared to child speech frequencies. They seem to be correlated; therefore CLC is found to be a suitable source for acquisition data. Afterwards, frequency counts from CLC are compared to BOUN Corpus frequencies to obtain AoA data. The frequency values from both corpora, AoA values obtained from questionnaire, and imageability values are put together for the purpose of creating the Turkish psycholinguistic database.