RICOA: Rich Captioning with Object Attributes


Sahin E. M., AKAR G.

2024 IEEE International Conference on Consumer Electronics, ICCE 2024, Nevada, United States Of America, 6 - 08 January 2024 identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/icce59016.2024.10444202
  • City: Nevada
  • Country: United States Of America
  • Keywords: image captioning, novel object captioning, object attributes, object tags, vision-language pretraining
  • Middle East Technical University Affiliated: Yes

Abstract

In this study, we demonstrate how state-of-the-art baseline image captioning methods overlook important details in the image and we analyze the reasoning behind this problem. We propose a novel approach, named RICOA (RIch Captioning with Object Attributes), which integrates object attributes to the generated captions. Our analyses demonstrate that the proposed approach generates richer and more visually grounded captions by integrating attributes of the objects in the scene to the generated captions successfully.