Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Bojian Chen; Wenbin Wu; Zhezhou Li; Tengfei Han; Zhuolei Chen; Weihao Zhang

doi:10.3934/era.2024031

Electronic Research Archive (Jan 2024)

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Bojian Chen ,
Wenbin Wu,
Zhezhou Li,
Tengfei Han,
Zhuolei Chen ,
Weihao Zhang

Affiliations

Bojian Chen: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China
Wenbin Wu: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China
Zhezhou Li: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China
Tengfei Han: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China
Zhuolei Chen: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China
Weihao Zhang: State Grid Fujian Electric Power Research Institute, No.64 Shoushan Road, Cangshan District, Fuzhou, China

DOI: https://doi.org/10.3934/era.2024031
Journal volume & issue: Vol. 32, no. 1
pp. 643 – 669

Abstract

Read online

The goal of RGB-D salient object detection is to aggregate the information of the two modalities of RGB and depth to accurately detect and segment salient objects. Existing RGB-D SOD models can extract the multilevel features of single modality well and can also integrate cross-modal features, but it can rarely handle both at the same time. To tap into and make the most of the correlations of intra- and inter-modality information, in this paper, we proposed an attention-guided cross-modal multi-feature aggregation network for RGB-D SOD. Our motivation was that both cross-modal feature fusion and multilevel feature fusion are crucial for RGB-D SOD task. The main innovation of this work lies in two points: One is the cross-modal pyramid feature interaction (CPFI) module that integrates multilevel features from both RGB and depth modalities in a bottom-up manner, and the other is cross-modal feature decoder (CMFD) that aggregates the fused features to generate the final saliency map. Extensive experiments on six benchmark datasets showed that the proposed attention-guided cross-modal multiple feature aggregation network (ACFPA-Net) achieved competitive performance over 15 state of the art (SOTA) RGB-D SOD methods, both qualitatively and quantitatively.

Published in Electronic Research Archive

ISSN: 2688-1594 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Science: Mathematics; Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods
Website: https://www.aimspress.com/journal/era

About the journal

Abstract

Keywords