VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers

Tianao Chen; Aotian Chen

doi:10.1109/ACCESS.2024.3350173

IEEE Access (Jan 2024)

VisionTwinNet: Gated Clarity Enhancement Paired With Light-Robust CD Transformers

Tianao Chen,
Aotian Chen

Affiliations

Tianao Chen: ORCiD; Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI, USA
Aotian Chen: ORCiD; Electrical and Computer Engineering Department, Georgia Institute of Technology, Atlanta, GA, USA

DOI: https://doi.org/10.1109/ACCESS.2024.3350173
Journal volume & issue: Vol. 12
pp. 4544 – 4560

Abstract

Read online

Deep learning has shown superiority in change detection (CD) tasks, notably the Transformer architecture with its self-attention mechanism, capturing long-range dependencies and outperforming traditional models. This capability provides the Transformer with significant advantages in capturing global-level features of complex changes in objects within high-resolution remote sensing images. Though Transformers are mature in Natural Language Processing (NLP), their application in computer vision, particularly CD tasks, is nascent. Current research on leveraging Transformers for CD reveals limitations, especially under varied lighting and seasonal changes. To address this, we propose VisionTwinNet, a two-stage strategy. First, our Gated EnhanceClearNet, a specially designed deep network reduces image noise and enhances brightness, preserving shadows and correcting color distortions. With its unique gating mechanism, this network can adaptively adjust the importance of features, thereby exhibiting superior performance in various remote sensing image degradation issues. Secondly, we have developed Hybrid Light-Robust CDNet, a hybrid robust lightweight network custom-designed for CD in remote sensing images. This module deeply integrates the advantages of CNN and Transformer and introduces an innovative attention mechanism design, optimizing the key/value dimensions separately, instead of adopting traditional single linear transformations, ensuring efficient detection. Specifically, the LR-Transformer Block employs a lightweight multi-head self-attention mechanism, optimizing computational efficiency while providing richer feature representations. Comparative studies with six CD methods on three public datasets validate VisionTwinNet’s robustness and efficacy. Our approach notably reduces algorithmic complexity and enhances the efficiency of the model.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords