Segmental Duration Modeling in Turkish


Ozturk O., ÇİLOĞLU T.

9th International Conference on Spoken Language Processing/INTERSPEECH 2006, Pennsylvania, Amerika Birleşik Devletleri, 01 Ocak 2006, ss.2378-2380 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Basıldığı Şehir: Pennsylvania
  • Basıldığı Ülke: Amerika Birleşik Devletleri
  • Sayfa Sayıları: ss.2378-2380
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Naturalness of synthetic speech highly depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish using machine-learning algorithms, especially Classification and Regression Trees (CART). The models predict phone durations based on attributes such as phone identity, neighboring phone identities, lexical stress, position of syllable in word, part-of-speech (POS) information, word length in number of syllables and position of word in utterance extracted from a speech corpus of approximately 700 sentences. Obtained models predict segment durations better than mean duration approximations (similar to 0.77 Correlation Coefficient, CC, and 20.4 ms Root-Mean Squared Error, RMSE). Attributes phone identity, neighboring phone identities, lexical stress, syllable type, POS, phrase break information, and location of word in the phrase constitute best predictor set for phoneme duration modeling.