IEEE Access (Jan 2024)

OFF-ViNet: Optical Flow-Based Feature Warping ViNet for Video Saliency Prediction Considering Future Prediction

  • Reita Ikenoya,
  • Tomonori Tashiro,
  • Gosuke Ohashi

DOI
https://doi.org/10.1109/ACCESS.2024.3394222
Journal volume & issue
Vol. 12
pp. 66921 – 66930

Abstract

Read online

Active studies have been conducted on video saliency prediction, which predicts human visual attention toward videos. Most deep learning-based video saliency prediction models implicitly learn features that contribute to video saliency prediction, greatly improving accuracy. This study proposes a model called optical flow-based feature warping ViNet (OFF-ViNet). This model explicitly adds a Warping module, which is a mechanism that considers future predictions based on object motion in addition to implicitly learned features. The Warping module spatially warps the hierarchical features extracted by the 3D convolutional backbone based on the optical flow to obtain a feature representation that predicts the future. Compared with existing models, OFF-ViNet achieves better and competitive accuracy with state-of-the-art models on video saliency prediction datasets, particularly on UCF-Sports, which contains several videos with moving objects.

Keywords