Data-driven image captioning via salient region discovery

Kilickaya, Mert; AKKUŞ, BURAK; Cakici, RUKET; Erdem, Aykut; Erdem, Erkut; İKİZLER CİNBİŞ, NAZLI

doi:10.1049/iet-cvi.2016.0286

Data-driven image captioning via salient region discovery

Atıf İçin Kopyala

Kilickaya M., AKKUŞ B. K., Cakici R., Erdem A., Erdem E., İKİZLER CİNBİŞ N.

IET COMPUTER VISION, cilt.11, sa.6, ss.398-406, 2017 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 11 Sayı: 6
Basım Tarihi: 2017
Doi Numarası: 10.1049/iet-cvi.2016.0286
Dergi Adı: IET COMPUTER VISION
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
Sayfa Sayıları: ss.398-406
Anahtar Kelimeler: image retrieval, image representation, feature extraction, pattern clustering, visual databases, text analysis, data-driven image captioning, salient region discovery, training images, object-based semantic image representation, deep feature-based retrieval framework, phrase selection paradigm, sentence generation model, input images, retrieved images, clustering framework, Flickr8K benchmark dataset, Flickr30K benchmark dataset, MODELS
Orta Doğu Teknik Üniversitesi Adresli: Evet

Özet

In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data-driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object-based semantic image representation into a deep features-based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state-of-the-art models.