Performance evaluation of similarity measures for dense multimodal stereovision

Yaman, Mustafa; Kalkan, SİNAN

doi:10.1117/1.jei.25.3.033013

Performance evaluation of similarity measures for dense multimodal stereovision

Atıf İçin Kopyala

Yaman M., Kalkan S.

JOURNAL OF ELECTRONIC IMAGING, cilt.25, 2016 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 25
Basım Tarihi: 2016
Doi Numarası: 10.1117/1.jei.25.3.033013
Dergi Adı: JOURNAL OF ELECTRONIC IMAGING
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Anahtar Kelimeler: dense multimodal stereovision, similarity measures, Kinect device, INFORMATION-BASED REGISTRATION, AUTOMATIC REGISTRATION, IMAGE REGISTRATION, MAXIMIZATION
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

Multimodal imaging systems have recently been drawing attention in fields such as medical imaging, remote sensing, and video surveillance systems. In such systems, estimating depth has become possible due to the promising progress of multimodal matching techniques. We perform a systematic performance evaluation of similarity measures frequently used in the literature for dense multimodal stereovision. The evaluated measures include mutual information (MI), sum of squared distances, normalized cross-correlation, census transform, local self-similarity (LSS) as well as descriptors adopted to multimodal settings, like scale invariant feature transform (SIFT), speeded-up robust features (SURF), histogram of oriented gradients (HOG), binary robust independent elementary features, and fast retina keypoint (FREAK). We evaluate the measures over datasets we generated, compiled, and provided as a benchmark and compare the performances using the Winner Takes All method. The datasets are (1) synthetically modified four popular pairs from the Middlebury Stereo Dataset (namely, Tsukuba, Venus, Cones, and Teddy) and (2) our own multimodal image pairs acquired using the infrared and the electrooptical cameras of a Kinect device. The results show that MI and HOG provide promising results for multimodal imagery, and FREAK, SURF, SIFT, and LSS can be considered as alternatives depending on the multimodality level and the computational complexity requirements of the intended application. (C) 2016 SPIE and IS&T