Proceedings of the XXth Conference of Open Innovations Association FRUCT (Apr 2024)
Multi-Sensor Fusion for Human Action Detection and Human Motion Prediction
Abstract
Understanding and predicting human behaviors accurately are essential prerequisites for effective human-robot interaction. Recently, there has been growing interest in multi-sensor fusion for creating robust and dependable robotic platforms, especially in outdoor settings. However, majority of current computer vision models focus on a single modality, such as LiDAR point cloud data or RGB images, and often capture only one person in each scene. This limited approach significantly restricts the effective use of all the available data in robotics. In this study, we propose utilizing multi-sensor fusion to enhance human action detection and motion prediction by incorporating 3D pose and motion information. This approach leverages robust human motion tracking and action detection, addressing issues like inaccurate human localization and matching ambiguity commonly found in single-camera view RGB videos of outdoor multi-person scenes. Our method demonstrates high performance on the publicly available Human-M3 dataset, showcasing the potential of applying multi-sensor multi-task models in real-world robotics scenarios.
Keywords