Structure Preserving Convolutional Attention for Image Captioning

Shichen Lu; Ruimin Hu; Jing Liu; Longteng Guo; Fei Zheng

doi:10.3390/app9142888

Applied Sciences (Jul 2019)

Structure Preserving Convolutional Attention for Image Captioning

Shichen Lu,
Ruimin Hu,
Jing Liu,
Longteng Guo,
Fei Zheng

Affiliations

Shichen Lu: National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, Wuhan 430072, China
Ruimin Hu: National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, Wuhan 430072, China
Jing Liu: National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Longteng Guo: National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Fei Zheng: China General Technology Research Institute, Beijing 100190, China

DOI: https://doi.org/10.3390/app9142888
Journal volume & issue: Vol. 9, no. 14
p. 2888

Abstract

Read online

In the task of image captioning, learning the attentive image regions is necessary to adaptively and precisely focus on the object semantics relevant to each decoded word. In this paper, we propose a convolutional attention module that can preserve the spatial structure of the image by performing the convolution operation directly on the 2D feature maps. The proposed attention mechanism contains two components: convolutional spatial attention and cross-channel attention, aiming to determine the intended regions to describe the image along the spatial and channel dimensions, respectively. Both of the two attentions are calculated at each decoding step. In order to preserve the spatial structure, instead of operating on the vector representation of each image grid, the two attention components are both computed directly on the entire feature maps with convolution operations. Experiments on two large-scale datasets (MSCOCO and Flickr30K) demonstrate the outstanding performance of our proposed method.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords