Applied Sciences (Oct 2024)

Reinforcement Learning with Multi-Policy Movement Strategy for Weakly Supervised Temporal Sentence Grounding

  • Shan Jiang,
  • Yuqiu Kong,
  • Lihe Zhang,
  • Baocai Yin

DOI
https://doi.org/10.3390/app14219696
Journal volume & issue
Vol. 14, no. 21
p. 9696

Abstract

Read online

Temporal grounding involves identifying the target moment based on the provided sentence in an untrimmed video. In weakly supervised temporal grounding studies, existing temporal sentence grounding methods face challenges in (1) learning semantic alignment between the candidate window and language query and (2) identifying accurate temporal boundaries during the grounding process. In this work, we propose a reinforcement learning (RL)-based multi-policy movement framework (MMF) for weakly supervised temporal sentence grounding. We imitate the behavior of human beings when grounding specified content in a video, starting from a coarse location and then identifying fine-grained temporal boundaries. The RL-based framework initially sets a series of candidate windows and learns to adjust them step-by-step by maximizing the rewards, indicating the semantic alignment between the current window and the query. To better learn the alignment, we propose a Gaussian-based Dual-Alignment Module (GDAM) which combines the strengths of both scoring-based and reconstruction-based alignment methods, addressing the issues of negative sample bias and language bias. We also employ the multi-policy movement strategy (MMS) which grounds the temporal position in a coarse-to-fine manner. Extensive experiments demonstrate that our proposed method outperforms existing weakly supervised algorithms, achieving state-of-the-art performance on the Charades-STA and ActivityNet Captions datasets.

Keywords