Vocal tract resonances tracking based on voiced and unvoiced speech classification using dynamic programming and fixed interval Kalman smoother


Oezbek I. Y. , Demirekler M.

33rd IEEE International Conference on Acoustics, Speech and Signal Processing, Nevada, United States Of America, 30 March - 04 April 2008, pp.4217-4220 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/icassp.2008.4518585
  • City: Nevada
  • Country: United States Of America
  • Page Numbers: pp.4217-4220

Abstract

This paper presents a systematic framework for accurate estimation of vocal tract resonances (formants) using neither training data nor a phonetic transcription. In the proposed method, the speech signal is segmented in voiced and unvoiced parts and the resonance frequencies of the vocal tract are estimated by dynamic programming and further processed by using Kalman filtering/smoothing for each part. The performance of the proposed method is compared with three different methods which are baseline, WaveSurfer [10] and MSR [5]. The proposed method reduces the overall vocal tract resonances (for F1, F2 and F3) estimation error rate by 35%, 39.6% and 2.74% over the baseline, WaveSurfer and MSR methods respectively.