Drones (Sep 2024)

Drone-Based Visible–Thermal Object Detection with Transformers and Prompt Tuning

  • Rui Chen,
  • Dongdong Li,
  • Zhinan Gao,
  • Yangliu Kuai,
  • Chengyuan Wang

DOI
https://doi.org/10.3390/drones8090451
Journal volume & issue
Vol. 8, no. 9
p. 451

Abstract

Read online

The use of unmanned aerial vehicles (UAVs) for visible–thermal object detection has emerged as a powerful technique to improve accuracy and resilience in challenging contexts, including dim lighting and severe weather conditions. However, most existing research relies on Convolutional Neural Network (CNN) frameworks, limiting the application of the Transformer’s attention mechanism to mere fusion modules and neglecting its potential for comprehensive global feature modeling. In response to this limitation, this study introduces an innovative dual-modal object detection framework called Visual Prompt multi-modal Detection (VIP-Det) that harnesses the Transformer architecture as the primary feature extractor and integrates vision prompts for refined feature fusion. Our approach begins with the training of a single-modal baseline model to solidify robust model representations, which is then refined through fine-tuning that incorporates additional modal data and prompts. Tests on the DroneVehicle dataset show that our algorithm achieves remarkable accuracy, outperforming comparable Transformer-based methods. These findings indicate that our proposed methodology marks a significant advancement in the realm of UAV-based object detection, holding significant promise for enhancing autonomous surveillance and monitoring capabilities in varied and challenging environments.

Keywords