Multi-modal video event recognition based on association rules and decision fusion

Guder, Mennan; Cicekli, FEHİME

doi:10.1007/s00530-017-0535-z

Multi-modal video event recognition based on association rules and decision fusion

Atıf İçin Kopyala

Guder M., Cicekli N. K.

MULTIMEDIA SYSTEMS, cilt.24, sa.1, ss.55-72, 2018 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 24 Sayı: 1
Basım Tarihi: 2018
Doi Numarası: 10.1007/s00530-017-0535-z
Dergi Adı: MULTIMEDIA SYSTEMS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.55-72
Anahtar Kelimeler: Event modeling, Event recognition, Concept learning, Convolutional neural network (CNN), Decision fusion, Association rule mining (ARM), Semantic video analysis, DATABASE
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this paper, we propose a multi-modal event recognition framework based on the integration of feature fusion, deep learning, scene classification and decision fusion. Frames, shots, and scenes are identified through the video decomposition process. Events are modeled utilizing features of and relations between the physical video parts. Event modeling is achieved through visual concept learning, scene segmentation and association rule mining. Visual concept learning is employed to reveal the semantic gap between the visual content and the textual descriptors of the events. Association rules are discovered by a specialized association rule mining algorithm where the proposed strategy integrates temporality into the rule discovery process. In addition to frames, shots and scenes, the concept of scene segment is proposed to define and extract elements of association rules. Various feature sources such as audio, motion, keypoint descriptors, temporal occurrence characteristics and fully connected layer outputs of CNN model are combined into the feature fusion. The proposed decision fusion approach employs logistic regression to formulate the relation between dependent variable (event type) and independent variables (classifiers' outputs) in terms of decision weights. Multi-modal fusion-based scene classifiers are employed in the event recognition. Rule-based event modeling and multi-modal fusion capability are shown to be promising approaches for event recognition. The decision fusion results are promising and the proposed algorithm is open to the fusion of new sources for further improvements. The proposal is also open to new event type integrations. The accuracy of the proposed methodology is evaluated on the CCV and Hollywood2 dataset for event recognition and results are compared with the benchmark implementations in the literature.