IET Image Processing (May 2024)

Time‐attentive fusion network: An efficient model for online detection of action start

  • Xuejiao Hu,
  • Shijie Wang,
  • Ming Li,
  • Yang Li,
  • Sidan Du

DOI
https://doi.org/10.1049/ipr2.13071
Journal volume & issue
Vol. 18, no. 7
pp. 1892 – 1902

Abstract

Read online

Abstract Online detection of action start is a significant and challenging task that requires prompt identification of action start positions and corresponding categories within streaming videos. This task presents challenges due to data imbalance, similarity in boundary content, and real‐time detection requirements. Here, a novel Time‐Attentive Fusion Network is introduced to address the requirements of improved action detection accuracy and operational efficiency. The time‐attentive fusion module is proposed, which consists of long‐term memory attention and the fusion feature learning mechanism, to improve spatial‐temporal feature learning. The temporal memory attention mechanism captures more effective temporal dependencies by employing weighted linear attention. The fusion feature learning mechanism facilitates the incorporation of current moment action information with historical data, thus enhancing the representation. The proposed method exhibits linear complexity and parallelism, enabling rapid training and inference speed. This method is evaluated on two challenging datasets: THUMOS’14 and ActivityNet v1.3. The experimental results demonstrate that the proposed method significantly outperforms existing state‐of‐the‐art methods in terms of both detection accuracy and inference speed.

Keywords