Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing

Alatan, ABDULLAH; AKANSU, ALİ; WOLF, WAYNE

doi:10.1023/a:1011395131992

Multi-modal dialog scene detection using hidden Markov models for content-based multimedia indexing

Atıf İçin Kopyala

Alatan A. A., AKANSU A. N., WOLF W.

MULTIMEDIA TOOLS AND APPLICATIONS, cilt.14, sa.2, ss.137-151, 2001 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 14 Sayı: 2
Basım Tarihi: 2001
Doi Numarası: 10.1023/a:1011395131992
Dergi Adı: MULTIMEDIA TOOLS AND APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.137-151
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

A class of audio-visual data (fiction entertainment: movies, TV series) is segmented into scenes, which contain dialogs, using a novel hidden Markov model-based (HMM) method. Each shot is classified using both audio track (via classification of speech, silence and music) and visual content (face and location information). The result of this shot-based classification is an audio-visual token to be used by the HMM state diagram to achieve scene analysis. After simulations with circular and left-to-right HMM topologies, it is observed that both are performing very good with multi-modal inputs. Moreover, for circular topology, the comparisons between different training and observation sets show that audio and face information together gives the most consistent results among different observation sets.