IEEE Access (Jan 2025)
Heavy and Lightweight Deep Learning Models for Semantic Segmentation: A Survey
Abstract
Semantic segmentation is an important computer vision task due to its numerous real-world applications such as autonomous driving, video surveillance, medical image analysis, robotics, augmented reality, among others, and its popularity increased with the development of deep learning approaches. We provide a detailed review comprising the most significant methods for both heavy and lightweight two-dimensional (2D) semantic segmentation, starting with the introduction of convolutional neural networks until the use of Transformer architecture, the latter being a widely adopted model with state-of-the-art results in several artificial intelligence fields. The methods involved are described from the architectural design perspective, including encoder-decoder architectures, multi-resolution branches approaches, two-pathway encoder architectures, attention-based models, and pyramid-based models. Additionally, some of the most popular datasets and performance metrics are presented. Further, we investigate the limitations of these methods, compare their performance on Pascal VOC 2012, Cityscapes, and ADE20K datasets, and finally indicate future research directions.
Keywords