SICE Journal of Control, Measurement, and System Integration (Dec 2024)

Effective human–object interaction recognition for edge devices in intelligent space

  • Haruhiro Ozaki,
  • Dinh Tuan Tran,
  • Joo-Ho Lee

DOI
https://doi.org/10.1080/18824889.2023.2292353
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 9

Abstract

Read online

To enable machines to understand human-centric images and videos, they need the capability to detect human–object interactions. This capability has been studied using various approaches, but previous research has mainly focused only on recognition accuracy using widely used open datasets. Given the need for advanced machine-learning systems that provide spatial analysis and services, the recognition model should be robust to various changes, have high extensibility, and provide sufficient recognition speed even with minimal computational overhead. Therefore, we propose a novel method that combines the skeletal method with object detection to accurately predict a set of $ \langle $ human, verb, object $ \rangle $ triplets in a video frame considering the robustness, extensibility, and lightweight of the model. Training a model with similar perceptual elements to those of humans produces sufficient accuracy for advanced social systems, even with only a small training dataset. The proposed model is trained using only the coordinates of the object and human landmarks, making it robust to various situations and lightweight compared with deep-learning methods. In the experiment, a scenario in which a human is working on a desk is simulated and an algorithm is trained on object-specific interactions. The accuracy of the proposed model was evaluated using various types of datasets.

Keywords