Cross-Modal Local Calibration and Global Context Modeling Network for RGB&#x2013;Infrared Remote-Sensing Object Detection

Jin Xie; Jing Nie; Bonan Ding; Mingyang Yu; Jiale Cao

doi:10.1109/JSTARS.2023.3315544

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2023)

Cross-Modal Local Calibration and Global Context Modeling Network for RGB–Infrared Remote-Sensing Object Detection

Jin Xie,
Jing Nie,
Bonan Ding,
Mingyang Yu,
Jiale Cao

Affiliations

Jin Xie: ORCiD; School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Jing Nie: ORCiD; School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
Bonan Ding: ORCiD; School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Mingyang Yu: ORCiD; School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Jiale Cao: ORCiD; School of Electrical and Information Engineering, Tianjin University, Tianjin, China

DOI: https://doi.org/10.1109/JSTARS.2023.3315544
Journal volume & issue: Vol. 16
pp. 8933 – 8942

Abstract

Read online

RGB–infrared object detection in remote-sensing images is crucial for achieving around-clock surveillance of unmanned aerial vehicles. RGB–infrared remote-sensing object detection methods based on deep learning usually mine the complementary information from RGB and infrared modalities by utilizing feature aggregation to achieve robust object detection for around-the-clock applications. Most of the existing methods aggregate features from RGB and infrared images by utilizing elementwise operations (e.g., elementwise addition or concatenation). The detection accuracy of these methods is limited. The main reasons can be concluded as follows: local location misalignment across modalities and insufficient nonlocal contextual information extraction. To address the above issues, we propose a cross-modal local calibration and global context modeling network (CLGNet), consisting of two novel modules: a cross-modal local calibration (CLC) module and a cross-modal global context (CGC) modeling module. The CLC module first aligns features from different modalities and then aggregates them selectively. The CGC module is embedded into the backbone network to capture cross-modal nonlocal long-range dependencies. The experimental results on popular RGB–infrared remote-sensing object detection datasets, namely DRoneVehicle and VEDAI, demonstrate the effectiveness and efficiency of our CLGNet.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords