Cogent Engineering (Dec 2024)
Perception system for navigation using Unified Scene Perception Network
Abstract
In the domain of scene interpretation for autonomous vehicles, it is very crucial to identify and classify the objects in an environment. Deep learning techniques like semantic segmentation, depth estimation, instance segmentation, object detection, etc., enable us to build efficient models with sufficient accuracy in classifying and identifying the objects in the scene. In this work, an innovative architectural framework called the Unified Scene Perception Network (USPNet) is designed to simultaneously handle semantic segmentation and depth estimation tasks. USPNet consists of a joint encoder that implements vision transformers (ViT) and is responsible for capturing and extracting high-level features from input data, along with two dedicated decoders specifically crafted for semantic segmentation and depth estimation, respectively. Within these decoder components, residual attention blocks such as the Convolutional Block Attention Module (CBAM) are implemented. Extensive experimental validation shows USPNet’s performance in segmentation accuracy, a high mean Intersection over Union (mIoU) of 71.26%. Its segmentation accuracy of 86.47% outperforms other models, highlighting its efficiency in pixel-wise classification. Thus, the proposed USPNet is effective and offering promising prospects for advancing the capabilities of autonomous driving systems in their ability to thoroughly analyze and navigate complex real-world scenarios.
Keywords