Visual Relationship Detection with Multimodal Fusion and Reasoning

Shouguan Xiao; Weiping Fu

doi:10.3390/s22207918

Sensors (Oct 2022)

Visual Relationship Detection with Multimodal Fusion and Reasoning

Shouguan Xiao,
Weiping Fu

Affiliations

Shouguan Xiao: School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an 710048, China
Weiping Fu: School of Mechanical and Precision Instrument Engineering, Xi’an University of Technology, Xi’an 710048, China

DOI: https://doi.org/10.3390/s22207918
Journal volume & issue: Vol. 22, no. 20
p. 7918

Abstract

Read online

Visual relationship detection aims to completely understand visual scenes and has recently received increasing attention. However, current methods only use the visual features of images to train the semantic network, which does not match human habits in which we know obvious features of scenes and infer covert states using common sense. Therefore, these methods cannot predict some hidden relationships of object-pairs from complex scenes. To address this problem, we propose unifying vision–language fusion and knowledge graph reasoning to combine visual feature embedding with external common sense knowledge to determine the visual relationships of objects. In addition, before training the relationship detection network, we devise an object–pair proposal module to solve the combination explosion problem. Extensive experiments show that our proposed method outperforms the state-of-the-art methods on the Visual Genome and Visual Relationship Detection datasets.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords