Remote Sensing (Mar 2020)

A Multi-Level Attention Model for Remote Sensing Image Captions

  • Yangyang Li,
  • Shuangkang Fang,
  • Licheng Jiao,
  • Ruijiao Liu,
  • Ronghua Shang

DOI
https://doi.org/10.3390/rs12060939
Journal volume & issue
Vol. 12, no. 6
p. 939

Abstract

Read online

The task of image captioning involves the generation of a sentence that can describe an image appropriately, which is the intersection of computer vision and natural language. Although the research on remote sensing image captions has just started, it has great significance. The attention mechanism is inspired by the way humans think, which is widely used in remote sensing image caption tasks. However, the attention mechanism currently used in this task is mainly aimed at images, which is too simple to express such a complex task well. Therefore, in this paper, we propose a multi-level attention model, which is a closer imitation of attention mechanisms of human beings. This model contains three attention structures, which represent the attention to different areas of the image, the attention to different words, and the attention to vision and semantics. Experiments show that our model has achieved better results than before, which is currently state-of-the-art. In addition, the existing datasets for remote sensing image captioning contain a large number of errors. Therefore, in this paper, a lot of work has been done to modify the existing datasets in order to promote the research of remote sensing image captioning.

Keywords