IEEE Access (Jan 2020)
Multi-Attention Network for Stereo Matching
Abstract
In recent years, convolutional neural network (CNN) algorithms promote the development of stereo matching and make great progress, but some mismatches still occur in textureless, occluded and reflective regions. In feature extraction and cost aggregation, CNNs will greatly improve the accuracy of stereo matching by utilizing global context information and high-quality feature representations. In this paper, we design a novel end-to-end stereo matching algorithm named Multi-Attention Network (MAN). To obtain the global context information in detail at the pixel-level, we propose a Multi-Scale Attention Module (MSAM), combining a spatial pyramid module with an attention mechanism, when we extract the image features. In addition, we introduce a feature refinement module (FRM) and a 3D attention aggregation module (3D AAM) during cost aggregation so that the network can extract informative features with high representational ability and high-quality channel attention vectors. Finally, we obtain the final disparity through bilinear interpolation and disparity regression. We evaluate our method on the Scene Flow, KITTI 2012 and KITTI 2015 stereo datasets. The experimental results show that our method achieves state-of-the-art performance and that every component of our network is effective.
Keywords