IEEE Access (Jan 2024)
Enhanced Industrial Action Recognition Through Self-Supervised Visual Transformers
Abstract
Precise recognition of operator actions is crucial in industrial automation for enhancing production efficiency and ensuring safety standards. This study introduces a novel self-supervised pre-training framework using visual transformers to address the challenge of industrial event recognition. The framework incorporates an innovative Tube Masking strategy and leverages a comprehensive industrial dataset to effectively capture spatiotemporal features. Evaluation on our custom-built industrial dataset revealed a top-1 accuracy of 95%, demonstrating the model’s practical applicability in real-world industrial environments. To further assess the model’s generalization capabilities, it was tested on several public datasets, achieving top-1 accuracies of 92.8% on UCF101, 87.1% on HMDB51, and 90.2% on Kinetics400. These results highlight the robustness and versatility of our approach, paving the way for its application in diverse industrial scenarios and further research.
Keywords