Content-based video copy detection

Tezin Türü: Yüksek Lisans

Tezin Yürütüldüğü Kurum: Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Elektrik ve Elektronik Mühendisliği Bölümü, Türkiye

Tezin Onay Tarihi: 2014

Öğrenci: SAVAŞ ÖZKAN

Danışman: GÖZDE AKAR

Özet:

In recent years, need in automatic video copy detection has been increased rapidly with the recent technical developments. In general, a developed system should provide a few requirements to conduct over large database including high detection accuracy, low comparison time and low memory usage. For that purpose, within the scope of the thesis, we propose a content-based video copy detection system that consists of three crucial stages namely feature extraction, quantization-based indexing and geometric verification. In feature extraction stage, local spatial and spatio-temporal features are extracted from reference and query videos to be used for similarity score calculation. In spatial domain, Scale Invariant Feature Transform (SIFT), Opponent SIFT, Flip Invariant SIFT (F-SIFT) and Speed Up Robust Transform (SURF) descriptors, in spatio-temporal domain, Histogram of Orientated Gradient (HoG) and Motion Boundary Histogram (MBH) descriptors are utilized. In the second stage, in order to make efficient comparison among local features, the local features are quantized into indices with three state-of-the-art indexing schemes Bag-of-word, Hamming Embedding and Product Quantization. In the final stage, since there would be outliers during matching content indices, a geometric post-processing stage is utilized for both spatial and spatio-temporal features that impose an overall geometric model to refine the accuracy. Additionally, a compact geometric signature that encodes the local relation of interest points with binary signature is computed. The experimental results are presented on the well-known TRECVID 2009 content-based video copy detection dataset. The experiments show that combination of Flip Invariant SIFT, Hamming embedding, enhanced weak geometric consistency and visual group binary signature yields the best overall result.