IEEE Access (Jan 2022)
A Smart System for Personal Protective Equipment Detection in Industrial Environments Based on Deep Learning at the Edge
Abstract
Real-time object detection is currently used to automate various tasks in industrial environments. One of the most important tasks is to improve the safety of workers by monitoring the correct use of Personal Protective Equipment (PPE) in dangerous areas. In this context, usually, a monitoring system analyzes the stream of videos from surveillance cameras to assess PPE usage in real time. When a worker not wearing the appropriate PPE is detected, an acoustic or visual alarm is triggered automatically to raise attention and awareness. The solutions proposed so far are mostly cloud-based systems: images from the site are continuously offloaded to the cloud for analysis. This centralized architecture requires significant network bandwidth to transmit the video feeds through an internet connection that must be reliable, as a network outage would disrupt the service. In this work, we propose a system for real-time PPE detection based on video streaming analysis and Deep Neural Network (DNN). We adopt the edge computing model in which the application for image analysis and classification is deployed on an embedded system installed in proximity of the camera and directly connected to it. The system does not require continuous image transmission towards a cloud system, thus ensuring bandwidth efficiency, reliability, and workers’ privacy. A prototype of the proposed system is developed exploiting a low-cost commercial embedded system, i.e. a Raspberry PI, equipped with an Intel Neural Compute Stick 2. We tested the system with five different pre-trained convolutional neural networks (CNNs), fine-tuned to detect different PPEs, namely helmets, vests, and gloves. In our experimental evaluation, we first compared the five CNNs in terms of classification performance and inference latency. Then, we deployed each CNN on the real system and evaluated the system’s throughput regarding the number of video frames analyzed each second.
Keywords