Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition

Salor, Ozgul; Pellom, Bryan; Ciloglu, TOLGA; Demirekler, MÜBECCEL

doi:10.1016/j.csl.2007.01.001

Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition

Salor O., Pellom B. L., Ciloglu T., Demirekler M.

COMPUTER SPEECH AND LANGUAGE, cilt.21, sa.4, ss.580-593, 2007 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 21 Sayı: 4
Basım Tarihi: 2007
Doi Numarası: 10.1016/j.csl.2007.01.001
Dergi Adı: COMPUTER SPEECH AND LANGUAGE
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.580-593
Anahtar Kelimeler: phonetic aligner, phone recognizer, language porting, speech corpora, speech recognition
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20 ins of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer. (c) 2007 Elsevier Ltd. All rights reserved.