Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective

Genc Hoxha; Farid Melgani; Begum Demir

doi:10.1109/JSTARS.2020.3013818

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2020)

Toward Remote Sensing Image Retrieval Under a Deep Image Captioning Perspective

Genc Hoxha,
Farid Melgani,
Begum Demir

Affiliations

Genc Hoxha: ORCiD; Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Farid Melgani: ORCiD; Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
Begum Demir: ORCiD; Faculty of Electrical Engineering and Computer Science, Technische Universität Berlin, Berlin, Germany

DOI: https://doi.org/10.1109/JSTARS.2020.3013818
Journal volume & issue: Vol. 13
pp. 4462 – 4475

Abstract

Read online

The performance of remote sensing image retrieval (RSIR) systems depends on the capability of the extracted features in characterizing the semantic content of images. Existing RSIR systems describe images by visual descriptors that model the primitives (such as different land-cover classes) present in the images. However, the visual descriptors may not be sufficient to describe the high-level complex content of RS images (e.g., attributes and relationships among different land-cover classes). To address this issue, in this article, we present an RSIR system that aims at generating and exploiting textual descriptions to accurately describe the relationships between the objects and their attributes present in RS images with captions (i.e., sentences). To this end, the proposed retrieval system consists of three main steps. The first step aims to encode the image visual features and then translate the encoded features into a textual description that summarizes the content of the image with captions. This is achieved based on the combination of a convolutional neural network with a recurrent neural network. The second step aims to convert the generated textual descriptions into semantically meaningful feature vectors. This is achieved by using the recent word embedding techniques. Finally, the last step estimates the similarity between the vectors of the textual descriptions of the query image and those of the archive images, and then retrieve the most similar images to the query image. Experimental results obtained on two different datasets show that the description of the image content with captions in the framework of RSIR leads to an accurate retrieval performance.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords