UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Zhanyuan Chang; Mingyu Xu; Yuwen Wei; Jie Lian; Chongming Zhang; Chuanjiang Li

doi:10.3390/s24206655

Sensors (Oct 2024)

UNeXt: An Efficient Network for the Semantic Segmentation of High-Resolution Remote Sensing Images

Zhanyuan Chang,
Mingyu Xu,
Yuwen Wei,
Jie Lian,
Chongming Zhang,
Chuanjiang Li

Affiliations

Zhanyuan Chang: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Mingyu Xu: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Yuwen Wei: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Jie Lian: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Chongming Zhang: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China
Chuanjiang Li: College of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai 200234, China

DOI: https://doi.org/10.3390/s24206655
Journal volume & issue: Vol. 24, no. 20
p. 6655

Abstract

Read online

The application of deep neural networks for the semantic segmentation of remote sensing images is a significant research area within the field of the intelligent interpretation of remote sensing data. The semantic segmentation of remote sensing images holds great practical value in urban planning, disaster assessment, the estimation of carbon sinks, and other related fields. With the continuous advancement of remote sensing technology, the spatial resolution of remote sensing images is gradually increasing. This increase in resolution brings about challenges such as significant changes in the scale of ground objects, redundant information, and irregular shapes within remote sensing images. Current methods leverage Transformers to capture global long-range dependencies. However, the use of Transformers introduces higher computational complexity and is prone to losing local details. In this paper, we propose UNeXt (UNet+ConvNeXt+Transformer), a real-time semantic segmentation model tailored for high-resolution remote sensing images. To achieve efficient segmentation, UNeXt uses the lightweight ConvNeXt-T as the encoder and a lightweight decoder, Transnext, which combines a Transformer and CNN (Convolutional Neural Networks) to capture global information while avoiding the loss of local details. Furthermore, in order to more effectively utilize spatial and channel information, we propose a SCFB (SC Feature Fuse Block) to reduce computational complexity while enhancing the model’s recognition of complex scenes. A series of ablation experiments and comprehensive comparative experiments demonstrate that our method not only runs faster than state-of-the-art (SOTA) lightweight models but also achieves higher accuracy. Specifically, our proposed UNeXt achieves 85.2% and 82.9% mIoUs on the Vaihingen and Gaofen5 (GID5) datasets, respectively, while maintaining 97 fps for 512 × 512 inputs on a single NVIDIA GTX 4090 GPU, outperforming other SOTA methods.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords