Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

IEEE International Conference on Acoustics, Speech, and Signal Processing, İstanbul, Türkiye, 5 - 09 Haziran 2000, ss.2401-2404

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/icassp.2000.859325
Basıldığı Şehir: İstanbul
Basıldığı Ülke: Türkiye
Sayfa Sayıları: ss.2401-2404
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A class of audio-visual content is segmented into dialogue scenes using the state transitions of a novel hidden Markov model (HMM). Each shot is classified using both audio track and visual content to determine the state/scene transitions of the model. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.