A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

Honglei Zhu; Pengjuan Wei; Zhigang Xu

doi:10.1049/cvi2.12257

IET Computer Vision (Apr 2024)

A Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder embedded pose for anomaly detection

Honglei Zhu,
Pengjuan Wei,
Zhigang Xu

Affiliations

Honglei Zhu: School of Computer and Communication Lanzhou University of Technology Lanzhou Gansu China
Pengjuan Wei: School of Computer and Communication Lanzhou University of Technology Lanzhou Gansu China
Zhigang Xu: School of Computer and Communication Lanzhou University of Technology Lanzhou Gansu China

DOI: https://doi.org/10.1049/cvi2.12257
Journal volume & issue: Vol. 18, no. 3
pp. 405 – 419

Abstract

Read online

Abstract Due to the robustness of skeleton data to human scale, illumination changes, dynamic camera views, and complex backgrounds, great progress has been made in skeleton‐based video anomaly detection in recent years. The spatio‐temporal graph convolutional network has been proven to be effective in modelling the spatio‐temporal dependencies of non‐Euclidean data such as human skeleton graphs, and the autoencoder based on this basic unit is widely used to model sequence features. However, due to the limitations of the convolution kernel, the model cannot capture the correlation between non‐adjacent joints, and it is difficult to deal with long‐term sequences, resulting in an insufficient understanding of behaviour. To address this issue, this paper applies the Transformer to the human skeleton and proposes the Spatio‐Temporal Enhanced Graph‐Transformer AutoEncoder (STEGT‐AE) to improve the capability of modelling. In addition, the multi‐memory model with skip connections is employed to provide different levels of coding features, thereby enhancing the ability of the model to distinguish similar heterogeneous behaviours. Furthermore, the STEGT‐AE has a single encoder‐double decoder architecture, which can improve the detection performance by the combining reconstruction and prediction error. The experimental results show that performances of STEGT‐AE is significantly better than other advanced algorithms on four baseline datasets.

Published in IET Computer Vision

ISSN: 1751-9632 (Print); 1751-9640 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://ietresearch.onlinelibrary.wiley.com/journal/17519640

About the journal

Abstract

Keywords