Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

Lingbing Meng; Mengya Yuan; Xuehan Shi; Qingqing Liu; Le Zhange; Jinhua Wu; Ping Dai; Fei Cheng

doi:10.1155/2023/9921988

Advances in Multimedia (Jan 2023)

Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

Lingbing Meng,
Mengya Yuan,
Xuehan Shi,
Qingqing Liu,
Le Zhange,
Jinhua Wu,
Ping Dai,
Fei Cheng

Affiliations

Lingbing Meng: School of Anhui Institute of Information Technology
Mengya Yuan: School of Anhui Institute of Information Technology
Xuehan Shi: School of Anhui Institute of Information Technology
Qingqing Liu: School of Anhui Institute of Information Technology
Le Zhange: School of Anhui Institute of Information Technology
Jinhua Wu: School of Anhui Institute of Information Technology
Ping Dai: School of Anhui Institute of Information Technology
Fei Cheng: School of Anhui Institute of Information Technology

DOI: https://doi.org/10.1155/2023/9921988
Journal volume & issue: Vol. 2023

Abstract

Read online

Existing RGB + depth (RGB-D) salient object detection methods mainly focus on better integrating the cross-modal features of RGB images and depth maps. Many methods use the same feature interaction module to fuse RGB and depth maps, which ignores the inherent properties of different modalities. In contrast to previous methods, this paper proposes a novel RGB-D salient object detection method that uses a depth-feature guide cross-modal fusion module based on the properties of RGB and depth maps. First, a depth-feature guide cross-modal fusion module is designed using coordinate attention to utilize the simple data representation capability of depth maps effectively. Second, a dense decoder guidance module is proposed to recover the spatial details of salient objects. Furthermore, a context-aware content module is proposed to extract rich context information, which can predict multiple objects more completely. Experimental results on six benchmark public datasets demonstrate that, compared with 15 mainstream convolutional neural network detection methods, the saliency map edge contours detected by the proposed model have better continuity and the spatial structure details are clearer. Perfect results are achieved on four quantitative evaluation metrics. Furthermore, the effectiveness of the three proposed modules is verified through ablation experiments.

Published in Advances in Multimedia

ISSN: 1687-5680 (Print); 1687-5699 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/6048

About the journal