Viewing Bias Matters in 360<sup>&#x00B0;</sup> Videos Visual Saliency Prediction

Peng-Wen Chen; Tsung-Shan Yang; Gi-Luen Huang; Chia-Wen Huang; Yu-Chieh Chao; Chien-Hung Lu; Pei-Yuan Wu

doi:10.1109/ACCESS.2023.3269564

IEEE Access (Jan 2023)

Viewing Bias Matters in 360<sup>°</sup> Videos Visual Saliency Prediction

Peng-Wen Chen,
Tsung-Shan Yang,
Gi-Luen Huang,
Chia-Wen Huang,
Yu-Chieh Chao,
Chien-Hung Lu,
Pei-Yuan Wu

Affiliations

Peng-Wen Chen: ORCiD; Graduate Institute of Communication Engineering, National Taiwan University, Taipei City, Taiwan
Tsung-Shan Yang: ORCiD; Department of Electrical Engineering, University of Southern California, Los Angeles, CA, USA
Gi-Luen Huang: ORCiD; Graduate Institute of Communication Engineering, National Taiwan University, Taipei City, Taiwan
Chia-Wen Huang: Graduate Institute of Communication Engineering, National Taiwan University, Taipei City, Taiwan
Yu-Chieh Chao: Institute of Information Science, Academia Sinica, Taipei City, Taiwan
Chien-Hung Lu: ORCiD; Unusly, San Francisco, CA, USA
Pei-Yuan Wu: ORCiD; Department of Electrical Engineering, National Taiwan University, Taipei City, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2023.3269564
Journal volume & issue: Vol. 11
pp. 46084 – 46094

Abstract

Read online

360° video has been applied to many areas such as immersive contents, virtual tours, and surveillance systems. Compared to the field of view prediction on planar videos, the explosive amount of information contained in the omni-directional view on the entire sphere poses an additional challenge in predicting high-salient regions in 360° videos. In this work, we propose a visual saliency prediction model that directly takes 360° video in the equirectangular format. Unlike previous works that often adopted recurrent neural network (RNN) architecture for the saliency detection task, in this work, we utilize 3D convolution to a spatial-temporal encoder and generalize SphereNet kernels to construct a spatial-temporal decoder. We further study the statistical properties of viewing biases present in 360° datasets across various video types, which provides us with insights into the design of a fusing mechanism that incorporates the predicted saliency map with the viewing bias in an adaptive manner. The proposed model yields state-of-the-art performance, as evidenced by empirical results over renowned 360° visual saliency datasets such as Salient360!, PVS, and Sport360.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords