Ses çevirme Türkçe için ilgili konuşma analizi araçları geliştirme


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2005

Tezin Dili: İngilizce

Öğrenci: Özgül Salor

Danışman: MÜBECCEL DEMİREKLER

Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu

Özet:

In this dissertation, new approaches in the design of a voice transformation (VT) system for Turkish are proposed. Objectives in this thesis are two-fold. The first objective is to develop standard speech corpora and segmentation tools for Turkish speech research. The second objective is to consider new approaches for VT. A triphone-balanced set of 2462 Turkish sentences is prepared for analysis. An audio corpus of 100 speakers, each uttering 40 sentences out of the 2462-sentence set, is used to train a speech recognition system designed for English. This system is ported to Turkish to obtain a phonetic aligner and a phoneme recognizer. The triphone-balanced sentence set and the phonetic aligner are used to develop a speech corpus for VT. A new voice transformation approach based on Mixed Excitation Linear Prediction (MELP) speech coding framework is proposed. Multi-stage vector quantization of MELP is used to obtain speaker-specific line-spectral frequency (LSF) codebooks for source and target speakers. Histograms mapping the LSF spaces of source and target speakers are used for transformation in the baseline system. The baseline system is improved by a dynamic programming approach to estimate the target LSFs. As a second approach to the VT problem, quantizing the LSFs using k-means clustering algorithm is applied with dimension reduction of LSFs using principle component analysis. This approach provides speaker-specific codebooks out of the speech corpus instead of using MELP's pre-trained LSF codebook. Evaluations show that both dimension reduction and dynamic programming improve the transformation performance.