RICOA: Rich Captioning with Object Attributes

Sahin E. M., AKAR G.

2024 IEEE International Conference on Consumer Electronics, ICCE 2024, Nevada, Amerika Birleşik Devletleri, 6 - 08 Ocak 2024, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/icce59016.2024.10444202
Basıldığı Şehir: Nevada
Basıldığı Ülke: Amerika Birleşik Devletleri
Anahtar Kelimeler: image captioning, novel object captioning, object attributes, object tags, vision-language pretraining
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In this study, we demonstrate how state-of-the-art baseline image captioning methods overlook important details in the image and we analyze the reasoning behind this problem. We propose a novel approach, named RICOA (RIch Captioning with Object Attributes), which integrates object attributes to the generated captions. Our analyses demonstrate that the proposed approach generates richer and more visually grounded captions by integrating attributes of the objects in the scene to the generated captions successfully.