IEEE Access (Jan 2023)
Object-Based Hybrid Deep Learning Technique for Recognition of Sequential Actions
Abstract
Using different objects or tools to perform activities in a step-by-step manner is a common practice in various settings, including workplaces, households, and recreational activities. However, this approach can pose several challenges and potential hazards if the correct sequence of actions is not followed and the object or tool is not used in the appropriate sequence; therefore, it must be addressed to ensure safety and efficiency. These issues have garnered significant attention in recent years. Previous research has relied on using body keypoints to detect actions, but not the objects or tools used during activity. As a result, the lack of a system to identify the target objects or tools being used while performing tasks increases the risk of accidents and mishaps during the process. This study suggests a possible solution to the aforementioned issue by introducing a model that is both efficient and durable. The model utilizes video data to monitor and identify daily activities, as well as the objects involved in the process, thus enabling real-time feedback and alerts to enhance safety and productivity. The suggested model separates the overall recognition process into two components. Firstly, it utilizes the advanced BlazePose architecture for pose estimation, and interpolates any undetected and wrong-detected landmarks to enhance the precision of the posture estimation. After this, the features are forwarded to a long short-term memory network to identify the actions performed during the activity. Secondly, the model also employs an enhanced YOLOv4 algorithm for object detection, to accurately identify the objects used in the course of the activity. Finally, a durable and efficient activity recognition model has been developed, which achieves 95.91% accuracy rate in identifying actions, a mean average precision score of 97.68% for detecting objects, and overall activity recognition model that is capable of processing at a rate of 10.47 frames per second.
Keywords