IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

NLP-Based Fusion Approach to Robust Image Captioning

  • Riccardo Ricci,
  • Farid Melgani,
  • Jose Marcato Junior,
  • Wesley Nunes Goncalves

DOI
https://doi.org/10.1109/JSTARS.2024.3413323
Journal volume & issue
Vol. 17
pp. 11809 – 11822

Abstract

Read online

Robustness in remote sensing image captioning is crucial for real-world applications. However, most of the research focuses on improving the performance of single captioning algorithms, either by introducing novel feature processing units or metatasks that indirectly improve the captioning performance. Despite indisputable improvements in performance, we argue that relying on the output of a single model can be critical, especially when data scarcity limits the generalization capability of the trained algorithms. Focusing on the advantages of ensembles for improving robustness, we propose different ways to select or generate a single most coherent caption from a set of predictions made by different captioning algorithms. The disjunction between the two phases of prediction and selection/generation provides high flexibility for inserting different captioning algorithms, each with its peculiarities and strengths. In this context, based on neural natural language processing tools, our approach can be considered as an additional fusion block that enables higher robustness with a contained complexity burden.

Keywords