HoughNet: Integrating Near and Long-Range Evidence for Bottom-Up Object Detection

Samet N., Hicsonmez S., Akbaş E.

16th European Conference on Computer Vision, ECCV 2020, Glasgow, Birleşik Krallık, 23 - 28 Ağustos 2020, cilt.12370 LNCS, ss.406-423, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Cilt numarası: 12370 LNCS
Doi Numarası: 10.1007/978-3-030-58595-2_25
Basıldığı Şehir: Glasgow
Basıldığı Ülke: Birleşik Krallık
Sayfa Sayıları: ss.406-423
Anahtar Kelimeler: Bottom-up recognition, Hough transform, Image-to-image translation, Object detection, Voting
Açık Arşiv Koleksiyonu: AVESİS Açık Erişim Koleksiyonu
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

© 2020, Springer Nature Switzerland AG.This paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet’s best model achieves 46.4 AP (and 65.1 AP50), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in another task, namely, “labels to photo” image generation by integrating the voting module of HoughNet to two different GAN models and showing that the accuracy is significantly improved in both cases. Code is available at https://github.com/nerminsamet/houghnet.