Multimodal Sensor Fusion Using Symmetric Skip Autoencoder Via an Adversarial Regulariser

Snigdha Bhagat; Shiv Dutt Joshi; Brejesh Lall; Smriti Gupta

doi:10.1109/JSTARS.2020.3035633

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2021)

Multimodal Sensor Fusion Using Symmetric Skip Autoencoder Via an Adversarial Regulariser

Snigdha Bhagat,
Shiv Dutt Joshi,
Brejesh Lall,
Smriti Gupta

Affiliations

Snigdha Bhagat: ORCiD; Department of Electrical Engineering, IIT Delhi, New Delhi, India
Shiv Dutt Joshi: ORCiD; Department of Electrical Engineering, IIT Delhi, New Delhi, India
Brejesh Lall: ORCiD; Bharti School of Telecommunications Technology & Management, IIT Delhi, New Delhi, India
Smriti Gupta: Mastercard, India

DOI: https://doi.org/10.1109/JSTARS.2020.3035633
Journal volume & issue: Vol. 14
pp. 1146 – 1157

Abstract

Read online

The fusion of the spatial characteristics, of visual image, and spectral aspects, of infrared image, is of immense practical importance. In this work, we propose a novel spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. A residual autoencoder architecture, regularised by a residual adversarial network has been employed, to generate a more realistic fused image. The residual module serves as a primary building block for the encoder, decoder, and adversarial network, as an add on the symmetric skip connections, perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure. The encoder section is made up of two separate branches to carry out independent inference on both the visual as well as infrared images. The loss function has been designed to incorporate the characteristics of both the modalities by optimizing over the textural content of the visible image and the spectral content of its infrared counterpart. In order to efficiently optimize the network's parameters, an adversarial regulariser network has been proposed that would perform supervised learning on the fused image and the original visual image since the visual image contains most of the structural content in comparison to the infrared image. The adversarial game has been incorporated in the structure by the addition of classification loss in the generator and discriminator loss functions in addition to the content loss.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords