DEPTH IS ALL YOU NEED: SINGLE-STAGE WEAKLY SUPERVISED SEMANTIC SEGMENTATION FROM IMAGE-LEVEL SUPERVISION


Ergül M., ALATAN A. A.

29th IEEE International Conference on Image Processing, ICIP 2022, Bordeaux, Fransa, 16 - 19 Ekim 2022, ss.4233-4237 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/icip46576.2022.9897161
  • Basıldığı Şehir: Bordeaux
  • Basıldığı Ülke: Fransa
  • Sayfa Sayıları: ss.4233-4237
  • Anahtar Kelimeler: Depth, Self supervision, Semantic segmentation, Single stage, Weakly supervision
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

The costly process of obtaining semantic segmentation labels has driven research towards to weakly supervised semantic segmentation (WSSS) methods, with only image-level labels available for training. The lack of dense semantic scene representation requires methods to increase complexity to obtain additional semantic information (i.e. object/stuff extent and boundary) about the scene. This is often done though increased model complexity and sophisticated multi-stage training/refinement procedures. However, the lack of 3D geometric structure of a single image makes these efforts desperate at a certain point. In this work, we propose to harness (inverse) depth maps estimated from one single image via a monocular depth estimation model to integrate the 3D geometric structure of the scene into the segmentation model. In light of this proposal, we develop an end-to-end segmentation-based network model and a self-supervised training process to train for semantic masks from only image-level annotations in a single stage. Our experiments show that our one-stage method achieves comparable segmentation performance (val: 64.32, test: 64.91) on Pascal VOC when compared with those significantly more complex pipelines and outperforms SOTA single-stage methods.