IEEE Access (Jan 2023)

Accelerating Deep Neural Networks for Efficient Scene Understanding in Multi-Modal Automotive Applications

  • Stavros Nousias,
  • Erion-Vasilis Pikoulis,
  • Christos Mavrokefalidis,
  • Aris S. Lalos

DOI
https://doi.org/10.1109/ACCESS.2023.3258400
Journal volume & issue
Vol. 11
pp. 28208 – 28221

Abstract

Read online

Environment perception constitutes one of the most critical operations performed by semi- and fully- autonomous vehicles. In recent years, Deep Neural Networks (DNNs) have become the standard tool for perception solutions owing to their impressive capabilities in analyzing and modelling complex and dynamic scenes, from (often multi-modal) sensory inputs. However, the well-established performance of DNNs comes at the cost of increased time and storage complexity, which may become problematic in automotive perception systems due to the requirement for a short prediction horizon (as in many cases inference must be performed in real-time) and the limited computational, storage, and energy resources of mobile systems. A common way of addressing this problem is to transform the original large pre-trained networks into new smaller models, by utilizing Model Compression and Acceleration (MCA) techniques, improving both their storage and execution efficiency. Within the MCA framework, in this paper, we investigate the application of two state-of-the-art weight-sharing MCA techniques, namely a Vector Quantization (VQ) and a Dictionary Learning (DL) one, as well as two novel extensions, towards the acceleration and compression of widely used DNNs for 2D and 3D object-detection in automotive applications. Apart from the individual (uni-modal) networks, we also present and evaluate a multi-modal late-fusion algorithm for combining the detection results of the 2D and 3D detectors. Our evaluation studies are carried out on the KITTI Dataset. The obtained results lend themselves to a twofold interpretation. On the one hand, they showcase the significant acceleration and compression gains that can be achieved via the application of weight sharing on the selected DNN detectors, with limited accuracy loss, as well as highlight the performance differences between the two utilized weight-sharing approaches. On the other, they demonstrate the substantial boost in detection performance obtained by combining the outcome of the two unimodal individual detectors, using the proposed late-fusion-based multi-modal approach. Indeed, as our experiments have shown, pairing the high-performance DL-based MCA technique with the loss-mitigating effect of the multi-modal fusion approach, leads to highly accelerated models (up to approximately $2.5 \times $ and $6\times $ for the 2D and 3D detectors, respectively) with the performance loss of the fused results ranging in most cases within single-digits figures (as low as around 1% for the class “cars”).

Keywords