Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition


Salor O., Pellom B. L., Ciloglu T., Demirekler M.

COMPUTER SPEECH AND LANGUAGE, vol.21, no.4, pp.580-593, 2007 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 21 Issue: 4
  • Publication Date: 2007
  • Doi Number: 10.1016/j.csl.2007.01.001
  • Journal Name: COMPUTER SPEECH AND LANGUAGE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.580-593
  • Keywords: phonetic aligner, phone recognizer, language porting, speech corpora, speech recognition
  • Middle East Technical University Affiliated: Yes

Abstract

This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20 ins of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer. (c) 2007 Elsevier Ltd. All rights reserved.