Multimodal Feature-Guided Pretraining for RGB-T Perception

Junlin Ouyang; Pengcheng Jin; Qingwang Wang

doi:10.1109/JSTARS.2024.3454054

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Jan 2024)

Multimodal Feature-Guided Pretraining for RGB-T Perception

Junlin Ouyang,
Pengcheng Jin,
Qingwang Wang

Affiliations

Junlin Ouyang: ORCiD; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
Pengcheng Jin: ORCiD; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
Qingwang Wang: ORCiD; Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China

DOI: https://doi.org/10.1109/JSTARS.2024.3454054
Journal volume & issue: Vol. 17
pp. 16041 – 16050

Abstract

Read online

Wide-range multiscale object detection for multispectral scene perception from a drone perspective is challenging. Previous RGB-T perception methods directly use backbone pretrained on RGB for thermal infrared feature extraction, leading to unexpected domain shift. We propose a novel multimodal feature-guided masked reconstruction pretraining method, named M2FP, aimed at learning transferable representations for drone-based RGB-T environmental perception tasks without domain bias. This article includes two key innovations as follows. 1) We design a cross-modal feature interaction module in M2FP, which encourages modality-specific backbones to actively learn cross-modal feature representations and avoid modality bias issues. 2) We design a global-aware feature interaction and fusion module suitable for various downstream tasks, which enhances the model's environmental perception from a global perspective in wide-range drone-based scenes. We fine-tune M2FP on the drone-based object detection dataset (DroneVehicle) and semantic segmentation dataset (Kust4K). On these two tasks, compared to the second-best methods, M2FP achieves state-of-the-art performance, with an improvement of 1.8% in mean average precision and 0.9% in mean intersection over union, respectively.

Published in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

ISSN: 1939-1404 (Print); 2151-1535 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Ocean engineering; Science: Physics: Geophysics. Cosmic physics
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=4609443

About the journal

Abstract

Keywords