Efficient Discrimination and Localization of Multimodal Remote Sensing Images Using CNN-Based Prediction of Localization Uncertainty

Mykhail Uss; Benoit Vozel; Vladimir Lukin; Kacem Chehdi

doi:10.3390/rs12040703

Remote Sensing (Feb 2020)

Efficient Discrimination and Localization of Multimodal Remote Sensing Images Using CNN-Based Prediction of Localization Uncertainty

Mykhail Uss,
Benoit Vozel,
Vladimir Lukin,
Kacem Chehdi

Affiliations

Mykhail Uss: Department of Information-Communication Technologies, National Aerospace University, 61070 Kharkiv, Ukraine
Benoit Vozel: IETR UMR CNRS 6164, University of Rennes 1, Enssat, 22305 Lannion, France
Vladimir Lukin: Department of Information-Communication Technologies, National Aerospace University, 61070 Kharkiv, Ukraine
Kacem Chehdi: IETR UMR CNRS 6164, University of Rennes 1, Enssat, 22305 Lannion, France

DOI: https://doi.org/10.3390/rs12040703
Journal volume & issue: Vol. 12, no. 4
p. 703

Abstract

Read online

Detecting similarities between image patches and measuring their mutual displacement are important parts in the registration of multimodal remote sensing (RS) images. Deep learning approaches advance the discriminative power of learned similarity measures (SM). However, their ability to find the best spatial alignment of the compared patches is often ignored. We propose to unify the patch discrimination and localization problems by assuming that the more accurately two patches can be aligned, the more similar they are. The uncertainty or confidence in the localization of a patch pair serves as a similarity measure of these patches. We train a two-channel patch matching convolutional neural network (CNN), called DLSM, to solve a regression problem with uncertainty. This CNN inputs two multimodal patches, and outputs a prediction of the translation vector between the input patches as well as the uncertainty of this prediction in the form of an error covariance matrix of the translation vector. The proposed patch matching CNN predicts a normal two-dimensional distribution of the translation vector rather than a simple value of it. The determinant of the covariance matrix is used as a measure of uncertainty in the matching of patches and also as a measure of similarity between patches. For training, we used the Siamese architecture with three towers. During training, the input of two towers is the same pair of multimodal patches but shifted by a random translation; the last tower is fed by a pair of dissimilar patches. Experiments performed on a large base of real RS images show that the proposed DLSM has both a higher discriminative power and a more precise localization compared to existing hand-crafted SMs and SMs trained with conventional losses. Unlike existing SMs, DLSM correctly predicts translation error distribution ellipse for different modalities, noise level, isotropic, and anisotropic structures.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords