Fusion of multimodal information for multimedia information retrieval

Thesis Type: Doctorate

Institution Of The Thesis: Orta Doğu Teknik Üniversitesi, Faculty of Engineering, Department of Computer Engineering, Turkey

Approval Date: 2014


Consultant: ADNAN YAZICI


An effective retrieval of multimedia data is based on its semantic content. In order to extract the semantic content, the nature of multimedia data should be analyzed carefully and the information contained should be used completely. Multimedia data usually has a complex structure containing multimodal information. Noise in the data, non-universality of any single modality, and performance upper bound of each modality make it hard to rely on a single modality. Thus, multimodal fusion is a practical approach for improving the retrieval performance. However, two major challenges exist; 'what-to-fuse' and 'how-to-fuse'. In the scope of these challenges, the contribution of this thesis is four-fold. First, a general fusion framework is constructed by analyzing the studies in the literature and identifying the design aspects of general information fusion systems. Second, a class-specific feature selection (CSF) approach and a RELIEF-based modality weighting algorithm (RELIEF-MM) are proposed to handle the 'what-to-fuse' problem. Third, the 'how-to-fuse' problem is studied, and a novel mining and graph based combination approach is proposed. The approach enables an effective combination of the modalities represented with bag-of-words models. Lastly, a non-linear extension on the linear weighted fusion approach is proposed, by handling both of the 'what-to-fuse' and 'how-to-fuse' problems together. We have conducted comprehensive experiments on CalTech101, TRECVID 2007, 2008, 2011 and CCV datasets with various multi-feature and multimodal settings; and validate that our proposed algorithms are efficient, accurate and robust ways of dealing with the given challenges of multimodal information fusion.