HoughNet: Integrating Near and Long-Range Evidence for Visual Detection

SAMET, NERMİN; Hicsonmez, Samet; AKBAŞ, EMRE

doi:10.1109/tpami.2022.3200413

HoughNet: Integrating Near and Long-Range Evidence for Visual Detection

Atıf İçin Kopyala

SAMET N., Hicsonmez S., AKBAŞ E.

IEEE Transactions on Pattern Analysis and Machine Intelligence, cilt.45, sa.4, ss.4667-4681, 2023 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 45 Sayı: 4
Basım Tarihi: 2023
Doi Numarası: 10.1109/tpami.2022.3200413
Dergi Adı: IEEE Transactions on Pattern Analysis and Machine Intelligence
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, ABI/INFORM, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, EMBASE, INSPEC, MEDLINE, Metadex, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.4667-4681
Anahtar Kelimeler: Object detection, voting, bottom-up recognition, hough transform, video object detection, instance segmentation, 3D object detection, human pose estimation, image-to-image translation, label-to-image translation, OBJECT DETECTION
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

IEEEThis paper presents HoughNet, a one-stage, anchor-free, voting-based, bottom-up object detection method. Inspired by the Generalized Hough Transform, HoughNet determines the presence of an object at a certain location by the sum of the votes cast on that location. Votes are collected from both near and long-distance locations based on a log-polar vote field. Thanks to this voting mechanism, HoughNet is able to integrate both near and long-range, class-conditional evidence for visual recognition, thereby generalizing and enhancing current object detection methodology, which typically relies on only local evidence. On the COCO dataset, HoughNet's best model achieves 46.4 $AP$ (and 65.1 $AP_{50}$), performing on par with the state-of-the-art in bottom-up object detection and outperforming most major one-stage and two-stage methods. We further validate the effectiveness of our proposal in other visual detection tasks, namely, video object detection, instance segmentation, 3D object detection and keypoint detection for human pose estimation, and an additional “labels to photo” image generation task, where the integration of our voting module consistently improves performance in all cases. Code is available at https://github.com/nerminsamet/houghnet.