Modeling phoneme durations and fundamental frequency contours in Turkish speech


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2005

Öğrenci: ÖZLEM ÖZTÜRK

Danışman: TOLGA ÇİLOĞLU

Özet:

The term prosody refers to characteristics of speech such as intonation, timing, loudness, and other acoustical properties imposed by physical, intentional and emotional state of the speaker. Phone durations and fundamental frequency contours are considered as two of the most prominent aspects of prosody. Modeling phone durations and fundamental frequency contours in Turkish speech are studied in this thesis. Various methods exist for building prosody models. State-of-the-art is dominated by corpus-based methods. This study introduces corpus-based approaches using classification and regression trees to discover the relationships between prosodic attributes and phone durations or fundamental frequency contours. In this context, a speech corpus, designed to have specific phonetic and prosodic content has been recorded and annotated. A set of prosodic attributes are compiled. The elements of the set are determined based on linguistic studies and literature surveys. The relevances of prosodic attributes are investigated by statistical measures such as mutual information and information gain. Fundamental frequency contour and phone duration modeling are handled as independent problems. Phone durations are predicted by using regression trees where the set of prosodic attributes is formed by forward selection. Quantization of phone durations is studied to improve prediction quality. A two-stage duration prediction process is proposed for handling specific ranges of phone duration values. Scaling and shifting of predicted durations are proposed to minimize mean squared error. Fundamental frequency contour modeling is studied under two different frameworks. One of them generates a codebook of syllable-fundamental-frequency-contours by vector quantization. The codewords are used to predict sentence fundamental frequency contours. Pitch accent prediction by two different