IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2025)

CMDistill: Cross-Modal Distillation Framework for AAV Image Object Detection

  • Xiaozhong Tong,
  • Xiaojun Guo,
  • Xiaoyong Sun,
  • Runze Guo,
  • Shaojing Su,
  • Zhen Zuo

DOI
https://doi.org/10.1109/JSTARS.2024.3479717
Journal volume & issue
Vol. 18
pp. 1395 – 1409

Abstract

Read online

With the increasing intelligence of autonomous aerial vehicles carrying diverse mission payloads, the target detection domain consists mainly of single-modal and multimodal approaches with diverse and complex combinations. It is challenging to achieve an optimal tradeoff between expensive and complex models of multimodal detectors, which are difficult to deploy directly on autonomous aerial vehicles (AAVs), and the limited detection accuracy of single-modal detectors. To overcome this limitation, we developed a cross-modal target detector called CMDistill. Specifically, we designed an effective distillation loss method based on three components. First, to reduce differences in feature knowledge across modal interlayers, we designed a Pearson correlation coefficient to constrain negative knowledge. Second, we modeled the relational cues between features by computing the affinity matrices of the deeper semantic features of the teacher–student model to convey relational knowledge more accurately. Finally, the target bounding boxes and classification information predicted by the output of the teacher were passed to the student. Experimental results on the aerial vehicle detection dataset revealed that CMDistill achieved an optimal performance with an average accuracy of 74% on the RGB-only target detection task using fewer computational resources to achieve a detection performance comparable with that of multimodal approaches.

Keywords