IEEE Access (Jan 2024)
DETR Novel Small Target Detection Algorithm Based on Swin Transformer
Abstract
A small target object refers to an object whose relative size of the bounding box is very small, usually the ratio of the width of the bounding box to the width and height of the original image is less than 0.1, or the ratio of the area of the bounding box to the area of the original image is less than 0.03, or the absolute size is less than $32^{\ast } 32$ pixels. It has important applications in industrial defect detection, medical image processing, intelligent security, unmanned driving, and many other fields. Although great progress has been made in the field of target detection, which is limited to large target objects, due to the challenges of small size, inconspicuous features and insufficient data samples, the accuracy and speed of small target detection are low. To solve this problem, this paper proposes a novel small target object detection algorithm model: Swin Transformer’s DETR. In this algorithm, Swin Transformer is used as the backbone to extract the global features and local information of small targets, and a three-layer feature pyramid structure is used for feature fusion at the Neck layer to improve the calculation efficiency and model accuracy. Secondly, the detector is optimized, and the detector is replaced by two stages, and the ReLU activation function of FFN layer is replaced by the latest SwiGLU activation function, to avoid the problems of gradient disappearance and explosion and enhance the nonlinearity of the algorithm model. Large resolution size input is adopted on Tiny Person dataset, and its input value is set to [1400,800]. The above analysis is carried out on VOC and Tiny Person datasets, and the detection rates of small target objects are 88.9% and 48.3% respectively. The results show that the Swin Transformer’s DETR algorithm model proposed in this paper performs well on various datasets, and has strong generalization ability, stability and accuracy in different scenarios and datasets, which is higher than other algorithm models.
Keywords