International Journal of Advances in Signal and Image Sciences (Dec 2023)
REAL-TIME EMBEDDED SYSTEM OF MULTI-TASK CNN FOR ADVANCED DRIVING ASSISTANCE
Abstract
In this research, we've engineered a real-time embedded system for advanced driving assistance. Our approach involves employing a multi-task Convolutional Neural Network (CNN) capable of simultaneously executing three tasks: object detection, semantic segmentation, and disparity estimation. Confronted with the limitations of edge computing, we've streamlined resource usage by sharing a common encoder and decoder among these tasks. To enhance computational efficiency, we've opted for a blend of depth-wise separable convolution and bilinear interpolation, departing from the conventional transposed convolution. This strategic change reduced the multiply-accumulate operations to 23.3% and the convolution parameters to 16.7%.Our experimental findings demonstrate that the decoder's complexity reduction not only avoids compromising recognition accuracy but, in fact, enhances it. Furthermore, we've embraced a semi-supervised learning approach to heighten network accuracy when deployed in a target domain divergent from the source domain used during training. Specifically, we've employed manually crafted correct answers only for object detection to train the whole network for optimal performance in the target domain. For the foreground object categories, we generate pseudo-correct responses for semantic segmentation by employing bounding boxes from object detection and iteratively refining them. Conversely, for the background categories, we rely on the initial inference outcomes as pseudo-correct responses, abstaining from further adjustments. Semantic segmentation of object classes with widely different appearances can be achieved thanks to this method, which tells the rough position, size, and shape of each object to the task. Our experimental results substantiate that the incorporation of this semi-supervised learning technique leads to enhancements in both object detection and semantic segmentation accuracy. We implemented this multi-task CNN on an embedded Graphics Processing Unit (GPU) board, added multi-object tracking functionality, and achieved a throughput of 18 fps with 26 Watt power consumption.
Keywords