IEEE Access (Jan 2022)

Background-Aware Robust Context Learning for Weakly-Supervised Temporal Action Localization

  • Jinah Kim,
  • Jungchan Cho

DOI
https://doi.org/10.1109/ACCESS.2022.3183789
Journal volume & issue
Vol. 10
pp. 65315 – 65325

Abstract

Read online

Weakly supervised temporal action localization (WTAL) aims to localize temporal intervals of actions in an untrimmed video using only video-level action labels. Although the learning of the background is an important issue in WTAL, most previous studies have not utilized an effective background. In this study, we propose a novel method for robustly separating contexts, e.g., action-like background, from the foreground to more accurately localize the action intervals. First, we detect background segments based on their probabilities to minimize the impact of background estimation errors. Second, we define the entropy boundary of the foreground and the positive distance between the boundary and background entropy. The background probability and entropy boundary allow the segment-level classifier to robustly learn the background. Third, we improve the performance of the overall actionness model based on a consensus of the RGB and flow features. The results of extensive experiments demonstrate that the proposed method learns the context separately from the action, consequently achieving new state-of-the-art results on the THUMOS-14 and ActivityNet-1.2 benchmarks. We also confirm that using feature adaptation helps overcome the limitation of a pretrained feature extractor on datasets that contain many backgrounds, such as THUMOS-14.

Keywords