Cross-scale Feature Fusion Self-attention for Image Captioning

WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan

doi:10.11896/jsjkx.220600009

Jisuanji kexue (Oct 2022)

Cross-scale Feature Fusion Self-attention for Image Captioning

WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan

Affiliations

WANG Ming-zhan, JI Jun-zhong, JIA Ao-zhe, ZHANG Xiao-dan: School of Computer Science,Faculty of Information Technology,Beijing University of Technology,Beijing 100124,China ;Beijing Institute of Artificial Intelligence,Beijing University of Technology,Beijing 100124,China

DOI: https://doi.org/10.11896/jsjkx.220600009
Journal volume & issue: Vol. 49, no. 10
pp. 191 – 197

Abstract

Read online

In recent years,the encoder-decoder framework based on self-attention mechanism has become the mainstream model in image captioning.However,self-attention in the encoder only models the visual relations of low-scale features,ignoring some effective information in high-scale visual features,thus affecting the quality of the generated descriptions.To solve this problem,this paper proposes a cross-scale feature fusion self-attention(CFFSA) method for image captioning.Specifically,CFFSA integrates low-scale and high-scale visual features in self-attention to improve the range of attention from a visual perspective,which increases effective visual information and reduces noise,thereby learning more accurate visual and semantic relationships.Experiments on MS COCO dataset show that the proposed method can more accurately capture the relationship between cross-scale visual features and generate more accurate descriptions.In addition,CFFSA is a general method,which can further improve the performance of the model by combining with other self-attention based image captioning methods.

image captioning|self-attention|cross-scale feature fusion

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords