IET Computer Vision (Dec 2024)
DEUFormer: High‐precision semantic segmentation for urban remote sensing images
Abstract
Abstract Urban remote sensing image semantic segmentation has a wide range of applications, such as urban planning, resource exploration, intelligent transportation, and other scenarios. Although UNetFormer performs well by introducing the self‐attention mechanism of Transformer, it still faces challenges arising from relatively low segmentation accuracy and significant edge segmentation errors. To this end, this paper proposes DEUFormer by employing a special weighted sum method to fuse the features of the encoder and the decoder, thus capturing both local details and global context information. Moreover, an Enhanced Feature Refinement Head is designed to finely re‐weight features on the channel dimension and narrow the semantic gap between shallow and deep features, thereby enhancing multi‐scale feature extraction. Additionally, an Edge‐Guided Context Module is introduced to enhance edge areas through effective edge detection, which can improve edge information extraction. Experimental results show that DEUFormer achieves an average Mean Intersection over Union (mIoU) of 53.8% on the LoveDA dataset and 69.1% on the UAVid dataset. Notably, the mIoU of buildings in the LoveDA dataset is 5.0% higher than that of UNetFormer. The proposed model outperforms methods such as UNetFormer on multiple datasets, which demonstrates its effectiveness.
Keywords