Comparative analysis of hidden Markov models for multi-modal dialogue scene indexing

Creative Commons License

Alatan A. A., Akansu A., Wolf W.

IEEE International Conference on Acoustics, Speech, and Signal Processing, İstanbul, Turkey, 5 - 09 June 2000, pp.2401-2404 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/icassp.2000.859325
  • City: İstanbul
  • Country: Turkey
  • Page Numbers: pp.2401-2404
  • Middle East Technical University Affiliated: Yes


A class of audio-visual content is segmented into dialogue scenes using the state transitions of a novel hidden Markov model (HMM). Each shot is classified using both audio track and visual content to determine the state/scene transitions of the model. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.