Sensors (Aug 2024)

A Dynamic Position Embedding-Based Model for Student Classroom Complete Meta-Action Recognition

  • Zhaoyu Shou,
  • Xiaohu Yuan,
  • Dongxu Li,
  • Jianwen Mo,
  • Huibing Zhang,
  • Jingwei Zhang,
  • Ziyong Wu

DOI
https://doi.org/10.3390/s24165371
Journal volume & issue
Vol. 24, no. 16
p. 5371

Abstract

Read online

The precise recognition of entire classroom meta-actions is a crucial challenge for the tailored adaptive interpretation of student behavior, given the intricacy of these actions. This paper proposes a Dynamic Position Embedding-based Model for Student Classroom Complete Meta-Action Recognition (DPE-SAR) based on the Video Swin Transformer. The model utilizes a dynamic positional embedding technique to perform conditional positional encoding. Additionally, it incorporates a deep convolutional network to improve the parsing ability of the spatial structure of meta-actions. The full attention mechanism of ViT3D is used to extract the potential spatial features of actions and capture the global spatial–temporal information of meta-actions. The proposed model exhibits exceptional performance compared to baseline models in action recognition as observed in evaluations on public datasets and smart classroom meta-action recognition datasets. The experimental results confirm the superiority of the model in meta-action recognition.

Keywords