TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning

Jie Guo; Ze Li; Bin Song; Yuhao Chi

doi:10.3390/rs16111843

Remote Sensing (May 2024)

TSFE: Two-Stage Feature Enhancement for Remote Sensing Image Captioning

Jie Guo,
Ze Li,
Bin Song,
Yuhao Chi

Affiliations

Jie Guo: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Ze Li: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Bin Song: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China
Yuhao Chi: State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China

DOI: https://doi.org/10.3390/rs16111843
Journal volume & issue: Vol. 16, no. 11
p. 1843

Abstract

Read online

In the field of remote sensing image captioning (RSIC), mainstream methods typically adopt an encoder–decoder framework. Methods based on this framework often use only simple feature fusion strategies, failing to fully mine the fine-grained features of the remote sensing image. Moreover, the lack of context information introduction in the decoder results in less accurate generated sentences. To address these problems, we propose a two-stage feature enhancement model (TSFE) for remote sensing image captioning. In the first stage, we adopt an adaptive feature fusion strategy to acquire multi-scale features. In the second stage, we further mine fine-grained features based on multi-scale features by establishing associations between different regions of the image. In addition, we introduce global features with scene information in the decoder to help generate descriptions. Experimental results on the RSICD, UCM-Captions, and Sydney-Captions datasets demonstrate that the proposed method outperforms existing state-of-the-art approaches.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords