A NOVEL BOVW MIMICKING END-TO-END TRAINABLE CNN CLASSIFICATION FRAMEWORK USING OPTIMAL TRANSPORT THEORY

26th IEEE International Conference on Image Processing (ICIP), Taipei, Tayvan, 22 - 25 Eylül 2019, ss.3053-3057

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası:
Doi Numarası: 10.1109/icip.2019.8803276
Basıldığı Şehir: Taipei
Basıldığı Ülke: Tayvan
Sayfa Sayıları: ss.3053-3057
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

An end-to-end trainable convolutional neural network (CNN) framework which mimics bag of visual words (BoVW) is proposed for image classification. To this end, a new paradigm for histogram-like image representation is introduced and optimal transport (OT) distance is utilized for the similarity assessment. Any patch of an image is considered as a unique visual word and the image is represented as the uniform histogram of the visual words with the histogram bins associated to embedding vectors according to the semantic meanings of the corresponding visual words. Thus, in the CNN framework, the output of the last convolutional block is considered as the global representation of the image and the embeddings are inherently learned within the classification framework. With the proposed formulation, undesired quantization for the BoVW representation is no more required; moreover, the learned CNN features are naturally interpretable. The experiments on CIFAR-10, CIFAR-100 and SVHN datasets show that the replacement of the global pooling and fully connected layers with the proposed representation together with OT distance improves the baseline CNN framework.