Spectral modification for context-free voice conversion using MELP speech coding framework

Salor O., Demirekler M.

International Symposium on Intelligent Multimedia, Video and Speech Processing, Hong Kong, PEOPLES R CHINA, 20 - 22 October 2004, pp.314-317 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/isimp.2004.1434063
  • City: Hong Kong
  • Country: PEOPLES R CHINA
  • Page Numbers: pp.314-317
  • Middle East Technical University Affiliated: Yes


In this work, we have focused on spectral modification of speech for voice con version from one speaker to another. Speech conversion aims to modify the speech of one speaker such that the modified speech sounds as if spoken by another speaker. MELP (Mixed Excitation Linear Prediction) speech coding algorithm has been used as speech analysis and synthesis framework. Using a 230-sentence triphone balanced database of the two speakers, a mapping between the 4-stage vector quantization indexes for line spectral frequencies (LSF's) of the two speakers have been obtained. This mapping provides a context-free speech conversion for spectral properties of the speakers. Two different methods have been proposed to obtain the LSF mapping. The first method determines the corresponding source and the target LSF codeword indexes, while the second method finds a new LSF codebook for the target speaker. After the spectral modification, pitch modification is applied to the source speaker's residual to approximate the target speaker's pitch range and then the modified filter is driven by the modified residual signal. Subjective ABX listening tests have been carried out and the correct speaker perception rate has been obtained as 80% and 77% for the first and the second spectral conversion methods respectively. For future work, we are planning to integrate our previous work, on LPC filter and residual relationship analysis to increase the correct speaker perception rate.