Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning

Zhibin Guan; Kang Liu; Yan Ma; Xu Qian; Tongkai Ji

doi:10.3390/sym10110626

Symmetry (Nov 2018)

Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning

Zhibin Guan,
Kang Liu,
Yan Ma,
Xu Qian,
Tongkai Ji

Affiliations

Zhibin Guan: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Kang Liu: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Yan Ma: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Xu Qian: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China
Tongkai Ji: School of Mechanical Electronic & Information Engineering, China University of Mining & Technology (Beijing), Beijing 100083, China

DOI: https://doi.org/10.3390/sym10110626
Journal volume & issue: Vol. 10, no. 11
p. 626

Abstract

Read online

Image caption generation is a fundamental task to build a bridge between image and its description in text, which is drawing increasing interest in artificial intelligence. Images and textual sentences are viewed as two different carriers of information, which are symmetric and unified in the same content of visual scene. The existing image captioning methods rarely consider generating a final description sentence in a coarse-grained to fine-grained way, which is how humans understand the surrounding scenes; and the generated sentence sometimes only describes coarse-grained image content. Therefore, we propose a coarse-to-fine-grained hierarchical generation method for image captioning, named SDA-CFGHG, to address the two problems above. The core of our SDA-CFGHG method is a sequential dual attention that is used to fuse different grained visual information with sequential means. The advantage of our SDA-CFGHG method is that it can achieve image captioning in a coarse-to-fine-grained way and the generated textual sentence can capture details of the raw image to some degree. Moreover, we validate the impressive performance of our method on benchmark datasets—MS COCO, Flickr—with several popular evaluation metrics—CIDEr, SPICE, METEOR, ROUGE-L, and BLEU.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords