Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection

Jiajia Wu; Guangliang Han; Haining Wang; Hang Yang; Qingqing Li; Dongxu Liu; Fangjian Ye; Peixun Liu

doi:10.1109/ACCESS.2021.3126338

IEEE Access (Jan 2021)

Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection

Jiajia Wu,
Guangliang Han,
Haining Wang,
Hang Yang,
Qingqing Li,
Dongxu Liu,
Fangjian Ye,
Peixun Liu

Affiliations

Jiajia Wu: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Guangliang Han: Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Haining Wang: School of Police Administration, People’s Public Security University of China, Beijing, China
Hang Yang: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Qingqing Li: ORCiD; Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Dongxu Liu: Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China
Fangjian Ye: Institute of Forensic Science, Ministry of Public Security, Beijing, China
Peixun Liu: Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun, China

DOI: https://doi.org/10.1109/ACCESS.2021.3126338
Journal volume & issue: Vol. 9
pp. 150608 – 150622

Abstract

Read online

The depth map contains abundant spatial structure cues, which makes it extensively introduced into saliency detection tasks for improving the detection accuracy. Nevertheless, the acquired depth map is often with uneven quality, due to the interference of depth sensors and external environments, posing a challenge when trying to minimize the disturbances from low-quality depth maps during the fusion process. In this article, to mitigate such issues and highlight the salient objects, we propose a progressive guided fusion network (PGFNet) with multi-modal and multi-scale attention for RGB-D salient object detection. Particularly, we first present a multi-modal and multi-scale attention fusion model (MMAFM) to fully mine and utilize the complementarity of features at different scales and modalities for achieving optimal fusion. Then, to strengthen the semantic expressiveness of the shallow-layer features, we design a multi-modal feature refinement mechanism (MFRM), which exploits the high-level fusion feature to guide the enhancement of the shallow-layer original RGB and depth features before they are fused. Moreover, a residual prediction module (RPM) is applied to further suppress background elements. Our entire network adopts a top-down strategy to progressively excavate and integrate valuable information. Compared with the state-of-the-art methods, experimental results demonstrate the effectiveness of our proposed method both qualitatively and quantitatively on eight challenging benchmark datasets.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords