ICT Express (Apr 2023)
Combining transformer and CNN for object detection in UAV imagery
Abstract
Combining multiple models is a well-known technique to improve predictive performance in challenging tasks such as object detection in UAV imagery. In this paper, we propose fusion of transformer-based and convolutional neural network-based (CNN) models with two approaches. First, we ensemble Swin Transformer and DetectoRS with ResNet backbone, and conduct performance comparison on four typical methods for combining predictions of multiple object detection models. Second, we design a hybrid architecture by combining Swin Transformer backbone with a neck of DetectoRS. We show that the fusion of the transformer and the CNN-based models performs better compared to the respective baseline model.