Data‐driven image captioning via salient region discovery

Mert Kilickaya; Burak Kerim Akkus; Ruket Cakici; Aykut Erdem; Erkut Erdem; Nazli Ikizler‐Cinbis

doi:10.1049/iet-cvi.2016.0286

IET Computer Vision (Sep 2017)

Data‐driven image captioning via salient region discovery

Mert Kilickaya,
Burak Kerim Akkus,
Ruket Cakici,
Aykut Erdem,
Erkut Erdem,
Nazli Ikizler‐Cinbis

Affiliations

Mert Kilickaya: Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
Burak Kerim Akkus: Department of Computer EngineeringMiddle East Technical UniversityAnkaraTurkey
Ruket Cakici: Department of Computer EngineeringMiddle East Technical UniversityAnkaraTurkey
Aykut Erdem: Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
Erkut Erdem: Department of Computer EngineeringHacettepe UniversityAnkaraTurkey
Nazli Ikizler‐Cinbis: Department of Computer EngineeringHacettepe UniversityAnkaraTurkey

DOI: https://doi.org/10.1049/iet-cvi.2016.0286
Journal volume & issue: Vol. 11, no. 6
pp. 398 – 406

Abstract

Read online

In the past few years, automatically generating descriptions for images has attracted a lot of attention in computer vision and natural language processing research. Among the existing approaches, data‐driven methods have been proven to be highly effective. These methods compare the given image against a large set of training images to determine a set of relevant images, then generate a description using the associated captions. In this study, the authors propose to integrate an object‐based semantic image representation into a deep features‐based retrieval framework to select the relevant images. Moreover, they present a novel phrase selection paradigm and a sentence generation model which depends on a joint analysis of salient regions in the input and retrieved images within a clustering framework. The authors demonstrate the effectiveness of their proposed approach on Flickr8K and Flickr30K benchmark datasets and show that their model gives highly competitive results compared with the state‐of‐the‐art models.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords