Symmetry (Nov 2018)

Sequential Dual Attention: Coarse-to-Fine-Grained Hierarchical Generation for Image Captioning

  • Zhibin Guan,
  • Kang Liu,
  • Yan Ma,
  • Xu Qian,
  • Tongkai Ji

DOI
https://doi.org/10.3390/sym10110626
Journal volume & issue
Vol. 10, no. 11
p. 626

Abstract

Read online

Image caption generation is a fundamental task to build a bridge between image and its description in text, which is drawing increasing interest in artificial intelligence. Images and textual sentences are viewed as two different carriers of information, which are symmetric and unified in the same content of visual scene. The existing image captioning methods rarely consider generating a final description sentence in a coarse-grained to fine-grained way, which is how humans understand the surrounding scenes; and the generated sentence sometimes only describes coarse-grained image content. Therefore, we propose a coarse-to-fine-grained hierarchical generation method for image captioning, named SDA-CFGHG, to address the two problems above. The core of our SDA-CFGHG method is a sequential dual attention that is used to fuse different grained visual information with sequential means. The advantage of our SDA-CFGHG method is that it can achieve image captioning in a coarse-to-fine-grained way and the generated textual sentence can capture details of the raw image to some degree. Moreover, we validate the impressive performance of our method on benchmark datasets—MS COCO, Flickr—with several popular evaluation metrics—CIDEr, SPICE, METEOR, ROUGE-L, and BLEU.

Keywords