IEEE Access (Jan 2025)

TransDeep: Transformer-Integrated DeepLabV3+ for Image Semantic Segmentation

  • Tengfei Chai,
  • Zhiguo Xiao,
  • Xiangfeng Shen,
  • Qian Liu,
  • NianFeng Li,
  • Tong Guan,
  • Jia Tian

DOI
https://doi.org/10.1109/ACCESS.2024.3525065
Journal volume & issue
Vol. 13
pp. 6277 – 6291

Abstract

Read online

In recent years, image semantic segmentation algorithms have made significant progress driven by deep learning technology, and are widely used in fields such as medical image analysis, assistive technology for the visually impaired people, and autonomous driving. Aiming at problems such as the inability of many image segmentation algorithms to fully capture global context information, low computational efficiency, and insufficient context information fusion. This article integrates the Transformer mechanism and CA mechanism into the DeepLabV3+ network and proposes the TransDeep network. In the encoder, two different backbones, Xception and MobileNetV2, are first used for feature extraction, and a better backbone network is selected. Furthermore, based on the lightweight backbone network, the Transformer mechanism is integrated into the advanced features of the backbone to enhance long-range dependence. Secondly, the Coord Attention module (CA) is added after the low-level features of the backbone to strengthen information such as edge and detail features. Finally, the Coord Attention mechanism (CA) is added after the ASPP module to allow the model to focus on key image features while effectively filtering out irrelevant background information. Experimental results show that the TransDeep network can improve the accuracy of key categories and effectively improve the network’s segmentation accuracy of targets in images. It achieved an MIoU of 73.5% on the Pascal test set and a good performance of 79.95% on the CamVid test set. Compared with the baseline model, they improved by 2.10% and 2.61% respectively.

Keywords