YOLO-DCTI: Small Object Detection in Remote Sensing Base on Contextual Transformer Enhancement

Lingtong Min; Ziman Fan; Qinyi Lv; Mohamed Reda; Linghao Shen; Binglu Wang

doi:10.3390/rs15163970

Remote Sensing (Aug 2023)

YOLO-DCTI: Small Object Detection in Remote Sensing Base on Contextual Transformer Enhancement

Lingtong Min,
Ziman Fan,
Qinyi Lv,
Mohamed Reda,
Linghao Shen,
Binglu Wang

Affiliations

Lingtong Min: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710072, China
Ziman Fan: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710072, China
Qinyi Lv: School of Electronic Information, Northwestern Polytechnical University, Xi’an 710072, China
Mohamed Reda: Department of Avionics, Military Technical College, Cairo 4393010, Egypt
Linghao Shen: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Binglu Wang: School of Automation, Northwestern Polytechnical University, Xi’an 710072, China

DOI: https://doi.org/10.3390/rs15163970
Journal volume & issue: Vol. 15, no. 16
p. 3970

Abstract

Read online

Object detection for remote sensing is a fundamental task in image processing of remote sensing; as one of the core components, small or tiny object detection plays an important role. Despite the considerable advancements achieved in small object detection with the integration of CNN and transformer networks, there remains untapped potential for enhancing the extraction and utilization of information associated with small objects. Particularly within transformer structures, this potential arises from the disregard of the complex and the intertwined interplay between spatial context information and channel information during the global modeling of pixel-level information within small objects. As a result, valuable information is prone to being obfuscated and annihilated. To mitigate this limitation, we propose an innovative framework, YOLO-DCTI, that capitalizes on the Contextual Transformer (CoT) framework for the detection of small or tiny objects. Specifically, within CoT, we seamlessly incorporate global residuals and local fusion mechanisms throughout the entire input-to-output pipeline. This integration facilitates a profound investigation into the network’s intrinsic representations at deeper levels and fosters the fusion of spatial contextual attributes with channel characteristics. Moreover, we propose an improved decoupled contextual transformer detection head structure, denoted as DCTI, to effectively resolve the feature conflicts that ensue from the concurrent classification and regression tasks. The experimental results on the Dota, VISDrone, and NWPU VHR-10 datasets show that, on the powerful real-time detection network YOLOv7, the speed and accuracy of tiny targets are better balanced.

Published in Remote Sensing

ISSN: 2072-4292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science
Website: http://www.mdpi.com/journal/remotesensing/

About the journal

Abstract

Keywords