Interpretable Information Visualization for Enhanced Temporal Action Detection in Videos

Dasom Ahn; Jong-Ha Lee; Byoung Chul Ko

doi:10.1109/ACCESS.2024.3438546

IEEE Access (Jan 2024)

Interpretable Information Visualization for Enhanced Temporal Action Detection in Videos

Dasom Ahn,
Jong-Ha Lee,
Byoung Chul Ko

Affiliations

Dasom Ahn: Department of Computer Engineering, Keimyung University, Daegu, South Korea
Jong-Ha Lee: ORCiD; Department of Biomedical Engineering, Keimyung University, Daegu, South Korea
Byoung Chul Ko: ORCiD; Department of Computer Engineering, Keimyung University, Daegu, South Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3438546
Journal volume & issue: Vol. 12
pp. 107385 – 107393

Abstract

Read online

Temporal action detection (TAD) is one of the most active research areas in computer vision. TAD is the task of detecting actions in untrimmed videos and predicting the start and end times of the actions. TAD is a challenging task and requires a variety of temporal cues. In this paper, we present a one-stage transformer-based temporal action detection model using enhanced long- and short-term attention. Recognizing multiple actions in a video sequence requires an understanding of various temporal continuities. These temporal continuities encompass both long- and short-term temporal dependencies. To learn these long- and short-term temporal dependencies, our model leverages long- and short-term temporal attention based on transformers. In short-term temporal attention, we consider long-term memory to learn short-term temporal features and use compact long-term memory to efficiently learn long-term memory. Long-term temporal attention uses deformable attention to dynamically select the required features from long-term memory and efficiently learn the long-term features. Furthermore, our model offers interpretability for TAD by providing visualizations of class-specific probability changes for temporal action variations. This allows for a deeper understanding of the model’s decision-making process and facilitates further analysis of TAD. Based on the results of experiments conducted on the THUMOS14 and ActivityNet-1.3 datasets, our proposed model achieves an improved performance compared to previous state-of-the-art models. Our code is available at https://github.com/tommy-ahn/LSTA.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords