Jisuanji kexue yu tansuo (Oct 2024)
Downsampling Algorithm with Fusion of Different Receptive Field Sizes in Deep Detection Methods
Abstract
The advantage of deep detection models primarily benefits from the feature representation ability of the backbone network, where down-sampling plays a key role in semantic integration. However, existing down-sampling approaches often ignore the global structural information of features, due to the usage of the small receptive field manner. To address this issue, this paper proposes a plug-and-play dual path down-sampling method (DPDM). It improves the support of backbone network for subsequent detection, through an extra large receptive field branch. Built on the traditional small receptive field channel, DPDM constructs an efficient large receptive field branch to obtain the structural information of features. Inspired from spatial-to-depth operation, it can achieve the effectiveness of a large receptive field under a conventional convolution kernel setting. The dual-path operation increases diversity of features but doesn’t emphasize the coordination between both types of features. Therefore, DPDM subsequently uses channel concatenation and point-wise convolution techniques to merge the features of two paths. Taking the advanced YOLO as benchmark, experimental evaluations of three models (YOLOX, YOLOv5, YOLOv6) on different datasets demonstrate the effectiveness of this method in improving detection accuracy.
Keywords