Data-driven image captioning via salient region discovery

Creative Commons License

Kilickaya M., AKKUŞ B. K. , Cakici R. , Erdem A., Erdem E., İKİZLER CİNBİŞ N.

IET COMPUTER VISION, vol.11, no.6, pp.398-406, 2017 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 11 Issue: 6
  • Publication Date: 2017
  • Doi Number: 10.1049/iet-cvi.2016.0286
  • Title of Journal : IET COMPUTER VISION
  • Page Numbers: pp.398-406
  • Keywords: image retrieval, image representation, feature extraction, pattern clustering, visual databases, text analysis, data-driven image captioning, salient region discovery, training images, object-based semantic image representation, deep feature-based retrieval framework, phrase selection paradigm, sentence generation model, input images, retrieved images, clustering framework, Flickr8K benchmark dataset, Flickr30K benchmark dataset, MODELS


In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.