International Journal of Applied Earth Observations and Geoinformation (Dec 2022)
Joint alignment of the distribution in input and feature space for cross-domain aerial image semantic segmentation
Abstract
Benefiting from the development of deep learning, researchers have made significant progress and achieved superior performance in the semantic segmentation of remote sensing (RS) data. However, when encountering an unseen scenario, the performance of a trained model deteriorates dramatically because of the domain shift. Unsupervised domain adaptation (UDA) provides an alternative to address the issue. Aligning the high-level representations via adversarial learning is a popular way, but it is difficult when there is a large gap in input space. With this consideration, we design a framework to jointly align the distribution in input and feature space. For input space alignment, we unify the resolution for the consistency of content and propose a lightweight module named Digital Number Transformer (DNT) to reduce the visual differences. For feature space alignment, we design a Multi-Scale Feature aggregation (MSF) module and introduce the Fine-Grained Discriminator (FGD) to conduct a category-level alignment at multiple layers so that features can be fully aligned and negative transfer can be reduced. We carried out experiments in diverse cross-domain scenarios, including the discrepancy in geographic position and the discrepancy in both geographic position and imaging mode. Comprehensive experiments demonstrate that our method outperforms other state-of-the-art methods in all scenarios.