Complex & Intelligent Systems (Aug 2024)

Repmono: a lightweight self-supervised monocular depth estimation architecture for high-speed inference

  • Guowei Zhang,
  • Xincheng Tang,
  • Li Wang,
  • Huankang Cui,
  • Teng Fei,
  • Hulin Tang,
  • Shangfeng Jiang

DOI
https://doi.org/10.1007/s40747-024-01575-0
Journal volume & issue
Vol. 10, no. 6
pp. 7927 – 7941

Abstract

Read online

Abstract Self-supervised monocular depth estimation has always attracted attention because it does not require ground truth data. Designing a lightweight architecture capable of fast inference is crucial for deployment on mobile devices. The current network effectively integrates Convolutional Neural Networks (CNN) with Transformers, achieving significant improvements in accuracy. However, this advantage comes at the cost of an increase in model size and a significant reduction in inference speed. In this study, we propose a network named Repmono, which includes LCKT module with a large convolutional kernel and RepTM module based on the structural reparameterisation technique. With the combination of these two modules, our network achieves both local and global feature extraction with a smaller number of parameters and significantly enhances inference speed. Our network, with 2.31MB parameters, shows significant accuracy improvements over Monodepth2 in experiments on the KITTI dataset. With uniform input dimensions, our network’s inference speed is 53.7% faster than R-MSFM6, 60.1% faster than Monodepth2, and 81.1% faster than MonoVIT-small. Our code is available at https://github.com/txc320382/Repmono .

Keywords