IEEE Access (Jan 2022)

Spatial-Net for Human-Object Interaction Detection

  • Ahmed E. Mansour,
  • Ammar Mohammed,
  • Hussein Abd El Atty Elsayed,
  • Salwa Elramly

DOI
https://doi.org/10.1109/ACCESS.2022.3199380
Journal volume & issue
Vol. 10
pp. 88920 – 88931

Abstract

Read online

Human-object interaction (HOI) detection is the detection of a human’s relationship with an object in still images and videos. The majority of HOI detection methods rely on appearance features as the primary feature for detecting the relationship between humans and objects. Furthermore, the model’s performance is affected by the abundance of false-positive pairs generated by the image’s non-interactive human-object pairs and human-object mis-grouping. In this paper, we propose “Spatial-Net”, a new HOI detection approach in still images. In the proposed approach, the HOI problem is divided into two main tasks, namely pair-prediction and global-rejection. In the pair-prediction task, the spatial relationship is adopted to predict the human-object interaction for each human-object pair using spatial features that contains spatial map which is a single channel image that represents human-object pairs including body parts and object masks, relative geometry features such as relative size, relative distance, and intersection-over-union between body part and objects, and weighted distance that is used as body part attention deterministic model. In the global-rejection task, an augmented model is employed to reject false positive pairs. We use the Hungarian matching technique to assign human-object pairs for each action and human-centric model to reject the non-interaction human-object pairs according to semantic co-occurrence between human and object. The experimental results on the V-COCO dataset demonstrate that the proposed Spatial-Net outperforms many state-of-the-art HOI models with less inference time.

Keywords