Multimedia data modeling and semantic analysis by multimodal decision fusion


Tezin Türü: Doktora

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2015

Öğrenci: MENNAN GÜDER

Danışman: FEHİME NİHAN ÇİÇEKLİ

Özet:

In this thesis, we propose a multi-modal event recognition framework based on the integration of event modeling, fusion, deep learning and, association rule mining. Event modeling is achieved through visual concept learning, scene segmentation and association rule mining. Visual concept learning is employed to reveal the semantic gap between the visual content and the textual descriptors of the events. Association rules are discovered by a specialized association rule mining algorithm where the proposed strategy integrates temporality into the rule discovery process. In addition to physical parts of video, the concept of scene segment is proposed to define and extract elements of association rules. Various feature sources such as audio, motion, keypoint descriptors, temporal occurrence characteristics and fully connected layer outputs of CNN model are combined into the feature fusion. The proposed decision fusion approach employs logistic regression to formulate the relation between dependent variable (event type) and independent variables (classifiers’ outputs) in terms of decision weights. The main motivation in this thesis is to construct a multimodal fusion system which detects events in video by examining feature and decision sources. Various feature sets such as audio, visual, motion and deep learning are investigated. The proposed system employs a decision fusion methodology as the final step of semantic analysis. The main issues that are investigated throughout this study are robustness to uncertainty, better event recognition by use of multi-modal fusion, deep learning outputs, extracted rules, and flexibility in representation.