IEEE Access (Jan 2023)

Reasoning in Different Directions: Triplet Learning for Scene Graph Generation

  • Xuecheng Sun,
  • Zhe-Ming Lu,
  • Zewei He,
  • Ziqian Lu,
  • Hao Luo

DOI
https://doi.org/10.1109/ACCESS.2023.3310544
Journal volume & issue
Vol. 11
pp. 103069 – 103078

Abstract

Read online

Scene graph generation aims to detect objects and their relations in images, providing structured representations for scene understanding. Currently, mainstream approaches first detect the objects and then solve a classification task to determine the relation between each object pair, ignoring the other combinations of the subject-predicate-object triplet. In this work we propose a triplet learning paradigm for scene graph generation, where given any two entities of the triplet we learn to predict the third. The multi-task learning scheme is adopted to equip a scene graph generation model with the triplet learning task, in which the prediction heads for the subject, object and predicate share the same backbone and are jointly trained. The proposed method does not require any additional annotation and is easy to embed in existing networks. It benefits scene graph generation models in gaining more generalizability and thus can be applied to both biased and unbiased methods. Moreover, we introduce a new Graph Structure-Aware Transformer (GSAT) model that incorporates the structural information of the scene graph via a modified self-attention mechanism. Extensive experiments show that the proposed triplet learning consistently improves the performance of several state-of-the-art models on the Visual Genome dataset.

Keywords