IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2021)
Multimodal Sensor Fusion Using Symmetric Skip Autoencoder Via an Adversarial Regulariser
Abstract
The fusion of the spatial characteristics, of visual image, and spectral aspects, of infrared image, is of immense practical importance. In this work, we propose a novel spatially constrained adversarial autoencoder that extracts deep features from the infrared and visible images to obtain a more exhaustive and global representation. A residual autoencoder architecture, regularised by a residual adversarial network has been employed, to generate a more realistic fused image. The residual module serves as a primary building block for the encoder, decoder, and adversarial network, as an add on the symmetric skip connections, perform the functionality of embedding the spatial characteristics directly from the initial layers of encoder structure to the decoder part of the network. The spectral information in the infrared image is incorporated by adding the feature maps over several layers in the encoder part of the fusion structure. The encoder section is made up of two separate branches to carry out independent inference on both the visual as well as infrared images. The loss function has been designed to incorporate the characteristics of both the modalities by optimizing over the textural content of the visible image and the spectral content of its infrared counterpart. In order to efficiently optimize the network's parameters, an adversarial regulariser network has been proposed that would perform supervised learning on the fused image and the original visual image since the visual image contains most of the structural content in comparison to the infrared image. The adversarial game has been incorporated in the structure by the addition of classification loss in the generator and discriminator loss functions in addition to the content loss.
Keywords