IEEE Access (Jan 2023)

SDDS-Net: Space and Depth Encoder-Decoder Convolutional Neural Networks for Real-Time Semantic Segmentation

  • Hatem Ibrahem,
  • Ahmed Salem,
  • Hyun-Soo Kang

DOI
https://doi.org/10.1109/ACCESS.2023.3327323
Journal volume & issue
Vol. 11
pp. 119362 – 119372

Abstract

Read online

In this paper, we propose novel convolutional encoder-decoder architectures for real-time semantic segmentation based on an image-to-image translation approach via the space-to-depth and depth-to-space modules. We present architectures that compress the spatial information of the image using the space-to-depth (SD) instead of the commonly used pooling methods (Max-pooling and Average-pooling) or strided convolution approaches. The SD module can reduce the image size while preserving the spatial information of the image in the form of extra depth information, this approach is much better than the pooling approaches which introduce a loss in the information and the details of the image. We also propose a lightweight and simple decoder stage using the depth-to-space (DS) module which constructs a high-resolution dense prediction map from a large number of low-resolution feature maps. The proposed architectures are efficient in learning image classification and semantic segmentation with high accuracy and average processing speed. We trained and tested our proposed architectures on image classification (i.e. CIFAR10 and Tiny ImageNet), and indoor and outdoor benchmarks for semantic segmentation specifically NYU-depthV2 and CITYSCAPES. The proposed architectures could attain high accuracy in classification (94.28% on CIFAR10 and 72.25% on Tiny ImageNet) and high mean average precision and pixel accuracy values in semantic segmentation (pixel accuracy of 78.55% on NYU-depthV2 and 87.9% on CITYSCAPES) while maintaining a real-time speed of frame processing outperforming recent state-of-the-art methods in semantic segmentation.

Keywords