IEEE Access (Jan 2020)

Cross-Modal Feature Integration Network for Human Eye-Fixation Prediction in RGB-D Images

  • Wenyu Liu,
  • Wujie Zhou,
  • Ting Luo

DOI
https://doi.org/10.1109/ACCESS.2020.3036681
Journal volume & issue
Vol. 8
pp. 202765 – 202773

Abstract

Read online

With the advent of convolutional neural networks, research progress in visual saliency prediction has been impressive. While integrating features at different stages from the backbone network is important, feature extraction itself is equally relevant. A network may lose representative information during feature extraction. We address the loss of spatial information and perform a fusion of features extracted from RGB and depth data for eye-fixation prediction. Specifically, we propose an asymmetric feature extraction network comprising an edge guidance module (EGM) and a feature integration module (FIM) that processes RGB-D images. Edge guidance supports the extraction of spatial information, while feature integration merges features from RGB images and the corresponding depth maps. We obtain the eye-fixation prediction maps by linearly fusing the features from the backbone network with those optimized using the two modules. Experimental results on NCTU and NUS, two benchmark datasets for RGB-D saliency prediction, verify the effectiveness and high-performance of the proposed network compared with similar methods.

Keywords