Three-Stream 3D deep CNN for no-Reference stereoscopic video quality assessment

Hassan Imani; Md Baharul Islam; Nafiz Arica

Intelligent Systems with Applications (Jan 2022)

Three-Stream 3D deep CNN for no-Reference stereoscopic video quality assessment

Hassan Imani,
Md Baharul Islam,
Nafiz Arica

Affiliations

Hassan Imani: Computer Vision Lab, Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey
Md Baharul Islam: Corresponding author.; Computer Vision Lab, Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey
Nafiz Arica: Computer Vision Lab, Faculty of Engineering and Natural Sciences, Bahcesehir University, Istanbul, Turkey

Journal volume & issue: Vol. 13
p. 200059

Abstract

Read online

Convolutional Neural Networks (CNNs) have achieved great success in learning computer vision tasks, particularly 3D CNNs, for extracting spatio-temporal features from the given videos. However, 3D CNNs have not been well-examined for the Stereoscopic Video Quality Assessment (SVQA). To our best knowledge, most of the state-of-the-art methods used the traditional hand-crafted feature extraction methods for the SVQA. Very few methods used the power of deep learning for SVQA, and they just considered the spatial information, ignoring the disparity and motion information. In this paper, we propose a No-Reference (NR) deep 3D CNN architecture that jointly focuses on spatial, disparity, and temporal information between consecutive frames. A 3-Stream 3D CNN, shortly 3S-3DCNN, by performing 3D CNNs, extracts features from spatial, motion, and depth channels to estimate the stereo video’s quality. It captures the degradations in the quality of the stereoscopic video in multiple dimensions. Firstly, the scene flow, which is the joint prediction of the optical flow and stereo disparity, is calculated. Then, the spatial information, optical flow, and disparity map of a given video are used as input to the 3S-3DCNN model. The extracted features are concatenated and utilized as inputs to the fully connected layers for doing the regression. We split the input videos into cube patches for data augmentation and remove the cubes that confuse our model from the training and testing sets. Two standard stereoscopic video quality assessment benchmarks of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1 were used to evaluate our method. Experimental results show that our 3S-3DCNN method’s objective score significantly correlates with the subjective SVQ scores in multiple video datasets. The RMSE for NAMA3DS1-COSPAD1 dataset is 0.2757, which outperforms other methods by a large margin. The SROCC value for the blur distortion of the LFOVIAS3DPh2 dataset is more than 98%, indicating that the 3S-3DCNN is consistent with human visual perception.

Published in Intelligent Systems with Applications

ISSN: 2667-3053 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/intelligent-systems-with-applications

About the journal

Abstract

Keywords