IEEE Access (Jan 2020)
Compact StereoNet: Stereo Disparity Estimation via Knowledge Distillation and Compact Feature Extractor
Abstract
Stereo disparity estimation is a difficult and crucial task in computer vision. Although many experimental techniques have been proposed in recent years with the flourishing of deep learning, very few studies take into account the optimization of computational complexity and memory consumption. Most previous works take advantage of stacked 3D convolutional block to generate fine disparity, but with a high computational cost and a large memory consumption. Considering the aforementioned problem, in this paper, we proposed an efficient convolutional neural architecture for stereo disparity estimation. In particular, a compact and efficient multi-scale extractor named MCliqueNet with stacked CliqueBlock was proposed to extract the more refined features for constructing multi-scale cost volume. In order to reduce the computational cost and maintain the accuracy of disparity, we utilized knowledge distillation scheme to transfer contextual features from a teacher network to a student network. Furthermore, we present a novel adaptive SmoothL1 (ASL) Loss for calculating the similarity between the contextual features of the teacher network and those of the student network, resulting in a more robust distillation process. Experimental results have shown that our method achieves competitive performance on the challenging Scene Flow and KITTI benchmarks while maintaining a very fast running speed.
Keywords