Mirror complementary transformer network for RGB‐thermal salient object detection

Xiurong Jiang; Yifan Hou; Hui Tian; Lin Zhu

doi:10.1049/cvi2.12221

IET Computer Vision (Feb 2024)

Mirror complementary transformer network for RGB‐thermal salient object detection

Xiurong Jiang,
Yifan Hou,
Hui Tian,
Lin Zhu

Affiliations

Xiurong Jiang: State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
Yifan Hou: State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
Hui Tian: State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China
Lin Zhu: School of Computer Science Beijing Institute of Technology Beijing China

DOI: https://doi.org/10.1049/cvi2.12221
Journal volume & issue: Vol. 18, no. 1
pp. 15 – 32

Abstract

Read online

Abstract Conventional RGB‐T salient object detection treats RGB and thermal modalities equally to locate the common salient regions. However, the authors observed that the rich colour and texture information of the RGB modality makes the objects more prominent compared to the background; and the thermal modality records the temperature difference of the scene, so the objects usually contain clear and continuous edge information. In this work, a novel mirror‐complementary Transformer network (MCNet) is proposed for RGB‐T SOD, which supervise the two modalities separately with a complementary set of saliency labels under a symmetrical structure. Moreover, the attention‐based feature interaction and serial multiscale dilated convolution (SDC)‐based feature fusion modules are introduced to make the two modalities complement and adjust each other flexibly. When one modality fails, the proposed model can still accurately segment the salient regions. To demonstrate the robustness of the proposed model under challenging scenes in real world, the authors build a novel RGB‐T SOD dataset VT723 based on a large public semantic segmentation RGB‐T dataset used in the autonomous driving domain. Extensive experiments on benchmark and VT723 datasets show that the proposed method outperforms state‐of‐the‐art approaches, including CNN‐based and Transformer‐based methods. The code and dataset can be found at https://github.com/jxr326/SwinMCNet.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords