An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation

Xiangkai Xu; Zhejun Feng; Changqing Cao; Mengyuan Li; Jin Wu; Zengyan Wu; Yajie Shang; Shubing Ye

doi:10.3390/rs13234779

Remote Sensing (Nov 2021)

An Improved Swin Transformer-Based Model for Remote Sensing Object Detection and Instance Segmentation

Xiangkai Xu,
Zhejun Feng,
Changqing Cao,
Mengyuan Li,
Jin Wu,
Zengyan Wu,
Yajie Shang,
Shubing Ye

Affiliations

Xiangkai Xu: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Zhejun Feng: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Changqing Cao: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Mengyuan Li: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Jin Wu: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Zengyan Wu: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Yajie Shang: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China
Shubing Ye: School of Physics and Optoelectronic Engineering, Xidian University, 2 South TaiBai Road, Xi’an 710071, China

DOI: https://doi.org/10.3390/rs13234779
Journal volume & issue: Vol. 13, no. 23
p. 4779

Abstract

Read online

Remote sensing image object detection and instance segmentation are widely valued research fields. A convolutional neural network (CNN) has shown defects in the object detection of remote sensing images. In recent years, the number of studies on transformer-based models increased, and these studies achieved good results. However, transformers still suffer from poor small object detection and unsatisfactory edge detail segmentation. In order to solve these problems, we improved the Swin transformer based on the advantages of transformers and CNNs, and designed a local perception Swin transformer (LPSW) backbone to enhance the local perception of the network and to improve the detection accuracy of small-scale objects. We also designed a spatial attention interleaved execution cascade (SAIEC) network framework, which helped to strengthen the segmentation accuracy of the network. Due to the lack of remote sensing mask datasets, the MRS-1800 remote sensing mask dataset was created. Finally, we combined the proposed backbone with the new network framework and conducted experiments on this MRS-1800 dataset. Compared with the Swin transformer, the proposed model improved the mask AP by 1.7%, mask APS by 3.6%, AP by 1.1% and APS by 4.6%, demonstrating its effectiveness and feasibility.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords