IEEE Access (Jan 2022)
Semantic Segmentation Optimized for Low Compute Embedded Devices
Abstract
Deployment of a deep convolutional neural network (CNN) on low compute devices is an increasingly important area of research. Wearable and robotic systems use semantic information for efficient navigation or gain of contextual information. However, real-time semantic segmentation is challenging on low compute devices. We propose a compact CNN for real-time applications on low compute devices. Our decoder uses pixel shuffling to achieve efficient inferences. We compared our CNN with state-of-the-art models ranked on the Cityscapes real-time semantic segmentation category. We propose a modified Net score that includes frame per seconds to complement traditional metrics mean Intersection of Union (mIoU), the number of multiply-accumulate operations (GFLOPs), and the number of parameters to evaluate mobile computing performance. Our CNN achieved 65.7 FPS on GTX 1080, 76.7% mIoU without ImageNet pre-training, while requiring 25 GFLOPs and 4.55M parameters, resulting in a 127.53 modified Net score compared to 119.89 for Deep Dual Resolution Neural Network (DDRNET23_slim) and 115.39 for Regseg. Using the Camvid test set, our CNN performance (83.3% mIoU and 354 FPS with TensorRT) was superior to published mIoU values for Regseg and other CNNs. The accuracy and FPS on Camvid show state-of-the-art performance. To demonstrate compatibility with low compute devices, we evaluated our CNN on two mobile computing platforms and showed real-time performance (57 fps) on Jetson NX 8 GB with TensorRT and 12.65 fps on Jetson Xavier AGX without TensorRT. Our CNN can operate with high accuracy on low compute devices to support system which benefits from semantic information.
Keywords