Remote Sensing (Jul 2024)

CVTNet: A Fusion of Convolutional Neural Networks and Vision Transformer for Wetland Mapping Using Sentinel-1 and Sentinel-2 Satellite Data

  • Mohammad Marjani,
  • Masoud Mahdianpari,
  • Fariba Mohammadimanesh,
  • Eric W. Gill

DOI
https://doi.org/10.3390/rs16132427
Journal volume & issue
Vol. 16, no. 13
p. 2427

Abstract

Read online

Wetland mapping is a critical component of environmental monitoring, requiring advanced techniques to accurately represent the complex land cover patterns and subtle class differences innate in these ecosystems. This study aims to address these challenges by proposing CVTNet, a novel deep learning (DL) model that integrates convolutional neural networks (CNNs) and vision transformer (ViT) architectures. CVTNet uses channel attention (CA) and spatial attention (SA) mechanisms to enhance feature extraction from Sentinel-1 (S1) and Sentinel-2 (S2) satellite data. The primary goal of this model is to achieve a balanced trade-off between Precision and Recall, which is essential for accurate wetland mapping. The class-specific analysis demonstrated CVTNet’s proficiency across diverse classes, including pasture, shrubland, urban, bog, fen, and water. Comparative analysis showed that CVTNet outperforms contemporary algorithms such as Random Forest (RF), ViT, multi-layer perceptron mixer (MLP-mixer), and hybrid spectral net (HybridSN) classifiers. Additionally, the attention mechanism (AM) analysis and sensitivity analysis highlighted the crucial role of CA, SA, and ViT in focusing the model’s attention on critical regions, thereby improving the mapping of wetland regions. Despite challenges at class boundaries, particularly between bog and fen, and misclassifications of swamp pixels, CVTNet presents a solution for wetland mapping.

Keywords