RICOA: Rich Captioning with Object Attributes


Sahin E. M., AKAR G.

2024 IEEE International Conference on Consumer Electronics, ICCE 2024, Nevada, Amerika Birleşik Devletleri, 6 - 08 Ocak 2024 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/icce59016.2024.10444202
  • Basıldığı Şehir: Nevada
  • Basıldığı Ülke: Amerika Birleşik Devletleri
  • Anahtar Kelimeler: image captioning, novel object captioning, object attributes, object tags, vision-language pretraining
  • Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this study, we demonstrate how state-of-the-art baseline image captioning methods overlook important details in the image and we analyze the reasoning behind this problem. We propose a novel approach, named RICOA (RIch Captioning with Object Attributes), which integrates object attributes to the generated captions. Our analyses demonstrate that the proposed approach generates richer and more visually grounded captions by integrating attributes of the objects in the scene to the generated captions successfully.