IEEE Access (Jan 2023)

Complex Scene Segmentation With Local to Global Self-Attention Module and Feature Alignment Module

  • Xianfeng Ou,
  • Hanpu Wang,
  • Xinzhong Liu,
  • Jun Zheng,
  • Zhihao Liu,
  • Shulun Tan,
  • Hongzhi Zhou

DOI
https://doi.org/10.1109/ACCESS.2023.3311264
Journal volume & issue
Vol. 11
pp. 96530 – 96542

Abstract

Read online

It is challenging to accurately mode the local and global context during complex scene segmentation. To solve this problem, a scene semantic segmentation network contains local to global self-attention module and feature alignment module is proposed in this paper. The local to global self-attention module is designed to combine the local and global features, in which the transformer backbone treats all patches equally in the global scope, to extract high-level features. The improved masked transformer with feature alignment module (MtFAM), which combines the masked transformer and feature alignment module to form a new decoder structure, is designed to fuse the features obtained from the vision transformer backbone and the local to global self-attention module. Experimental results demonstrate that the proposed structure show better performance, which can improve the value of mIoU by 3.63% on the ADE20K validation dataset compared to the Vit-Tiny. In particular, it can obtain 2.23% higher mIoU value than the segmenter method using the same transformer backbone on the challenging scene segmentation benchmark.

Keywords