IEEE Access (Jan 2021)

Automatic CNN-Based Enhancement of 360° Video Experience With Multisensorial Effects

  • John Patrick Sexton,
  • Anderson Augusto Simiscuka,
  • Kevin Mcguinness,
  • Gabriel-Miro Muntean

DOI
https://doi.org/10.1109/ACCESS.2021.3115701
Journal volume & issue
Vol. 9
pp. 133156 – 133169

Abstract

Read online

High-resolution audio-visual virtual reality (VR) technologies currently offer satisfying experiences for both sight and hearing senses in the world of multimedia. However, the delivery of truly immersive experiences requires the incorporation of other senses such as touch and smell. Multisensorial effects are usually manually synchronized with videos and data is stored in companion files, which contain timestamps for these effects. This manual task becomes very complex for 360° videos, as the scenes triggering effects can occur in different viewpoints. The solution proposed in this paper aims to automatically add extra sensory information to immersive 360° videos. A novel scent prediction scheme using Convolutional Neural Networks (CNN) is proposed to perform scene predictions on 360° videos represented in the Equi-Angular Cubemap format in order to add scents relevant to the detected content. Digital signal processing is used to detect loud sounds in the video with a Root Mean Square (RMS) function, which are then associated with haptic feedback. A prototype was developed, which outputs multisensorial stimuli by using an olfaction dispenser and a haptic mouse. The proposed solution has been tested and it achieved excellent results in terms of accuracy of scene detection, olfaction latency and correct execution of the relevant effects. Different CNN architectures, including AlexNet, ResNet18 and ResNet50, were also assessed comparatively, achieving a labeling accuracy of up to 72.67% for olfaction-enhanced media.

Keywords