Zhejiang Daxue xuebao. Lixue ban (Mar 2024)
High-resolution image semantic segmentation network combining channel interaction spatial group attention and pyramid pooling(结合通道交互空间组注意力与金字塔池化的高分影像语义分割网络)
Abstract
High spatial resolution remote sensing images contain rich information, it is therefore very important to study their semantic segmentation. Traditional machine learning methods appear low accuracy and efficiency when used for segmenting high-resolution remote sensing images. In recent years, the deep learning method has developed rapidly and has become the mainstream method of image semantic segmentation. Some scholars have introduced SegNet, Deeplabv3+, U-Net and other neural networks into remote sensing image semantic segmentation, but these networks have only limited effect in remote sensing image semantic segmentation. This paper improves the U-Net network for semantic segmentation of remote sensing images. Firstly, an improved convolutional attention module channel interaction and spatial group attention module (CISGAM) is embedded in the feature extraction stage of the U-Net network, so that the network can obtain more effective features; secondly, a residual module is used in the decoding layer to replace the ordinary convolutional layer to avoid the degradation of the model. In addition, we use an attention pyramid pooling module (APPM) with CISGAM to connect the encoder and decoder of U-Net to enhance the network's extraction of multi-scale features. Finally, experiments are carried out on the UC Merced dataset with 0.3m resolution and the GID dataset with 1m resolution. Compared with the original networks such as U-Net and Deeplabv3+, the mean intersection over union (MIoU) of our method on the UCM dataset has increased by 14.56% and 8.72%, and the mean pixel accuracy (MPA) has increased by 12.71% and 8.24%, respectively. In the classification results on the GID dataset, the classification accuracy of waters, buildings and other objects has also been greatly improved. Compared with the original CBAM and PPM, the CISGAM and APPM also achieve certain performance improvement. The experimental results show that the feasibility and robustness of the model is stronger than traditional networks, and it can improve the accuracy of semantic segmentation of high-resolution remote sensing images through stronger feature extraction capabilities, hence providing a new approach for intelligent interpretation of high-resolution remote sensing images.(高空间分辨率(高分)遥感影像中存在海量信息,因此对高分影像的语义分割研究十分重要。传统机器学习方法的语义分割精度和效率均不高,近年来,深度学习方法迅速发展,逐渐成为影像语义分割领域的常用方法,已有研究将SegNet、Deeplabv3+、U-Net等神经网络引入遥感影像语义分割,但效果有限。考虑高分影像的特性,对用于遥感影像语义分割的U-Net网络进行了改进。首先,在U-Net网络特征提取过程中使用通道交互空间组注意力模块(channel interaction and spatial group attention module,CISGAM),使得网络能够获取更多有效特征。其次,在编码过程中将普通卷积层变换为残差模块,并在U-Net的编码器和解码器之间用加入了CISGAM的注意力金字塔池化模块(attention pyramid pooling module,APPM)连接,以加强网络对多尺度特征的提取。最后,在0.3 m分辨率的UC Merced数据集和1 m分辨率的GID数据集上进行实验,与U-Net、Deeplabv3+等原始网络相比,在UC Merced数据集上的平均交并比(mean intersection over union,MIoU)分别提升了14.56%和8.72%,平均像素准确率(mean pixel accuracy,MPA)分别提升了12.71%和8.24%。在GID数据集的分割结果中,水体、建筑物等地物的综合分割精度大幅提升,在平均分割精度上,CISGAM和APPM较常用的CBAM和PPM有一定提升。实验结果表明,加入CISGAM和APPM的网络可行性与鲁棒性均较传统网络强,其较强的特征提取能力有利于提升高分辨率遥感影像语义分割的精度,为高分辨率遥感影像智能解译提供新方案。)
Keywords