International Journal of Digital Earth (Dec 2024)

Incorporating object counts into remote sensing image captioning

  • Zihao Ni,
  • Zhaoyun Zong,
  • Peng Ren

DOI
https://doi.org/10.1080/17538947.2024.2392847
Journal volume & issue
Vol. 17, no. 1

Abstract

Read online

Existing methods for remote sensing image captioning tend to describe a remote sensing image using generic language that lacks specific information about object counts. To address this limitation, we propose a novel framework for generating a caption that includes object count information for the remote sensing image. Our proposed framework comprises three modules: object counting, preliminary captioning, and numeral editing. The object counting module identifies objects in a remote sensing image and determines object counts. The preliminary captioning module generates a caption that may lack object count information. The numeral editing module incorporates the object counts into the caption, resulting in a more precise caption. Our proposed framework outperforms existing methods, as demonstrated through evaluations on three remote sensing image datasets. Our proposed framework is a significant step toward more precise and informative remote sensing image captioning.

Keywords