IEEE Access (Jan 2024)
Advanced Human Pose Estimation and Event Classification Using Context-Aware Features and XGBoost Classifier
Abstract
This paper presents an advanced approach to Human Pose Estimation (HPE) and Semantic Event Classification (SEC), emphasizing the need for sophisticated human skeleton models, context-aware feature extraction, and machine learning techniques for precise event recognition in daily life logs. HPE, crucial in applications like sports analysis and surveillance systems, involves predicting human joint locations from images and videos. Recent deep learning advancements have significantly improved HPE, particularly in crowded scenes and occlusion challenges. Despite many surveys, a comprehensive review of HPE, especially with recent deep learning innovations, is still needed. Our research addresses this by proposing a novel HPE and SEC system. The system begins with preprocessing steps, including converting videos into image sequences, applying sliding window techniques, and converting images to grayscale, then extracting human silhouettes using binary masks. We use the GrabCut algorithm for human detection and perform skeletonization with Hough transform algorithm. Keypoint detection is achieved through pose estimation, and full-body feature extraction includes using OpenPose for movable body parts, the Lucas-Kanade method for a 3D Cartesian view, and Texton Map techniques. Key point features are further characterized using motion histograms, pose landmark visualization and Local Intensity Order Pattern (LIOP) features. The system is optimized with adaptive moment estimations and classified using the XGBoost Classifier. Evaluation on the COCO, UCF50, and YouTube datasets showed classification accuracies of 92.90%, 90.9%, and 91.2%, respectively, demonstrating our approach’s superior performance and effectiveness compared to existing state-of-the-art techniques.
Keywords