Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

Creative Commons License

Alatan A. A. , Akansu A., Wolf W.

IEEE International Conference on Acoustics, Speech, and Signal Processing, İstanbul, Türkiye, 5 - 09 Haziran 2000, ss.2401-2404 identifier identifier

  • Doi Numarası: 10.1109/icassp.2000.859325
  • Basıldığı Şehir: İstanbul
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.2401-2404


A class of audio-visual content is segmented into dialogue scenes using the state transitions of a novel hidden Markov model (HMM). Each shot is classified using both audio track and visual content to determine the state/scene transitions of the model. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.