Efficient multi-level cross-modal fusion and detection network for infrared and visible image

Hongwei Gao; Yutong Wang; Jian Sun; Yueqiu Jiang; Yonggang Gai; Jiahui Yu

Alexandria Engineering Journal (Dec 2024)

Efficient multi-level cross-modal fusion and detection network for infrared and visible image

Hongwei Gao,
Yutong Wang,
Jian Sun,
Yueqiu Jiang,
Yonggang Gai,
Jiahui Yu

Affiliations

Hongwei Gao: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China; State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China; Corresponding author at: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China.
Yutong Wang: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China
Jian Sun: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China; Corresponding author.
Yueqiu Jiang: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China
Yonggang Gai: School of Automation and Electrical Engineering, Shenyang Ligong University, Shenyang 110159, China
Jiahui Yu: Department of Biomedical Engineering, Zhejiang University, Hangzhou 310027, China; Innovation Center for Smart Medical Technologies & Devices, Binjiang Institute of Zhejiang University, Hangzhou 310053, China

Journal volume & issue: Vol. 108
pp. 306 – 318

Abstract

Read online

With the rapid development of uncrewed aerial vehicle (UAV) technology, detecting aerial images has found significant applications across various domains. However, existing algorithms overlook the impact of illumination on target detection, resulting in less satisfactory detection performance under low-light conditions. We propose EfficientFuseDet, a visible and infrared image fusion detection network to overcome this issue. First, an effective multilevel cross-modal fusion network called EfficientFuse is presented to combine complementary information from both modalities better. EfficientFuse captures local dependencies and global contextual information in shallow and deep layers, seamlessly combining complimentary local and global features throughout the network. The generated fused images can exhibit clear target contours and abundant texture information. Second, we propose a detection network called AFI-YOLO, which employs an inverted residual vision transformer backbone (IRViT) to effectively address the challenges associated with background interference in fused images. We design an efficient feature pyramid network (EFPN) that efficiently integrates multiscale information, enhancing multiscale detection capability using aerial images. A reparameterization decoupling head (RepHead) is proposed to further improve target classification and localization precision. Finally, experiments on the DroneVehicle dataset indicate that the detection accuracy using fused images can reach 47.2 %, which is higher than that observed with visible light images of 45 %. Compared to state-of-the-art detection algorithms, EfficientFuseDet exhibits a slight decrease in speed. However, it demonstrates superior detection capabilities and effectively enhances the detection accuracy using aerial images under low-light conditions.

Published in Alexandria Engineering Journal

ISSN: 1110-0168 (Print); 2090-2670 (Online)
Publisher: Elsevier
Country of publisher: Egypt
LCC subjects: Technology: Engineering (General). Civil engineering (General)
Website: http://www.journals.elsevier.com/alexandria-engineering-journal/

About the journal

Abstract

Keywords