SPEECH COMMUNICATION, vol.50, no.7, pp.594-604, 2008 (SCI-Expanded)
The use of articulator motion information in automatic speech segmentation is investigated. Automatic speech segmentation is an essential task in speech processing applications like speech synthesis where accuracy and consistency of segmentation are firmly connected to the quality of synthetic speech. The motions of upper and lower lips are incorporated into a hidden Markov model based segmentation process. The MOCHA-TIMIT database, which involves simultaneous articulatograph and microphone recordings, was used to develop and test the models. Different feature vector compositions are proposed for incorporation of articulator motion parameters to the automatic segmentation system. Average absolute boundary error of the system with respect to manual segmentation is decreased by 10.1%. The results are examined in a boundary class dependent manner using both acoustic and visual phone classes, and the performance of the system in different boundary types is discussed. After analyzing the boundary class dependent performance, the error reduction is increased to 18.0% by using the appropriate feature vectors in selected boundaries. (c) 2008 Elsevier B.V. All rights reserved.