3D Layout encoding network for spatial‐aware 3D saliency modelling

Jing Yuan; Yang Cao; Yu Kang; Weiguo Song; Zhongcheng Yin; Rui Ba; Qing Ma

doi:10.1049/iet-cvi.2018.5591

IET Computer Vision (Aug 2019)

3D Layout encoding network for spatial‐aware 3D saliency modelling

Jing Yuan,
Yang Cao,
Yu Kang,
Weiguo Song,
Zhongcheng Yin,
Rui Ba,
Qing Ma

Affiliations

Jing Yuan: State Key Laboratory of Fire ScienceUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Yang Cao: Department of Automation, School of Information Science and TechnologyUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Yu Kang: Department of Automation, School of Information Science and TechnologyUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Weiguo Song: State Key Laboratory of Fire ScienceUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Zhongcheng Yin: Department of Automation, School of Information Science and TechnologyUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Rui Ba: State Key Laboratory of Fire ScienceUniversity of Science and Technology of ChinaHefeiPeople's Republic of China
Qing Ma: State Key Laboratory of Fire ScienceUniversity of Science and Technology of ChinaHefeiPeople's Republic of China

DOI: https://doi.org/10.1049/iet-cvi.2018.5591
Journal volume & issue: Vol. 13, no. 5
pp. 480 – 488

Abstract

Read online

Three‐dimensional (3D) [red, green and blue (RGB) + depth] saliency modelling can help with popular 3D multimedia applications. However, depth images produced from existing 3D devices are often with low quality, e.g. containing noises and holes. In this study, rather than relying on features or predictions directly derived from single depth images, the authors propose to encode deep layout features to facilitate the spatial‐aware saliency prediction. Specifically, they first generate coarse depth‐induced saliency cues which are careless of depth details. Then, to leverage the information of the high‐quality RGB image, they embed both low‐level and high‐level RGB deep features to refine the final prediction. In this way, they take both bottom‐up and top‐down cues together with spatial layout into account and achieve better saliency modelling results. Experiments on five public datasets show the superiority of the proposed method.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords