IEEE Access (Jan 2024)

Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers

  • Yao Xiao,
  • Hua Xiang,
  • Tongxi Wang,
  • Yiju Wang

DOI
https://doi.org/10.1109/ACCESS.2024.3455749
Journal volume & issue
Vol. 12
pp. 134133 – 134143

Abstract

Read online

Precise recognition of operator actions is crucial in industrial automation for enhancing production efficiency and ensuring safety standards. This study introduces a novel self-supervised pre-training framework using visual transformers to address the challenge of industrial event recognition. The framework incorporates an innovative Tube Masking strategy and leverages a comprehensive industrial dataset to effectively capture spatiotemporal features. Evaluation on our custom-built industrial dataset revealed a top-1 accuracy of 95%, demonstrating the model’s practical applicability in real-world industrial environments. To further assess the model’s generalization capabilities, it was tested on several public datasets, achieving top-1 accuracies of 92.8% on UCF101, 87.1% on HMDB51, and 90.2% on Kinetics400. These results highlight the robustness and versatility of our approach, paving the way for its application in diverse industrial scenarios and further research.

Keywords