RGB depth salient object detection via cross‐modal attention and boundary feature guidance

Lingbing Meng; Mengya Yuan; Xuehan Shi; Le Zhang; Qingqing Liu; Dai Ping; Jinhua Wu; Fei Cheng

doi:10.1049/cvi2.12244

IET Computer Vision (Mar 2024)

RGB depth salient object detection via cross‐modal attention and boundary feature guidance

Lingbing Meng,
Mengya Yuan,
Xuehan Shi,
Le Zhang,
Qingqing Liu,
Dai Ping,
Jinhua Wu,
Fei Cheng

Affiliations

Lingbing Meng: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Mengya Yuan: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Xuehan Shi: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Le Zhang: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Qingqing Liu: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Dai Ping: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Jinhua Wu: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China
Fei Cheng: School of Computer and Software Engineering Anhui Institute of Information Technology Wuhu China

DOI: https://doi.org/10.1049/cvi2.12244
Journal volume & issue: Vol. 18, no. 2
pp. 273 – 288

Abstract

Read online

Abstract RGB depth (RGB‐D) salient object detection (SOD) is a meaningful and challenging task, which has achieved good detection performance in dealing with simple scenes using convolutional neural networks, however, it cannot effectively handle scenes with complex contours of salient objects or similarly coloured salient objects and background. A novel end‐to‐end framework is proposed for RGB‐D SOD, which comprises of four main components: the cross‐modal attention feature enhancement (CMAFE) module, the multi‐level contextual feature interaction (MLCFI) module, the boundary feature extraction (BFE) module, and the multi‐level boundary attention guidance (MLBAG) module. The CMAFE module retains the more effective salient features by employing a dual‐attention mechanism to filter noise from two modalities. In the MLCFI module, a shuffle operation is used for high‐level and low‐level channels to promote cross‐channel information communication, and rich semantic information is extracted. The BFE module converts salient features into boundary features to generate boundary maps. The MLBAG module produces saliency maps by aggregating multi‐level boundary saliency maps to guide cross‐modal features in the decode stage. Extensive experiments are conducted on six public benchmark datasets, with the results demonstrating that the proposed model significantly outperforms 23 state‐of‐the‐art RGB‐D SOD models with regards to multiple evaluation metrics.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords