IEEE Access (Jan 2021)
Illation of Video Visual Relation Detection Based on Graph Neural Network
Abstract
Visual relation detection task is the bridge between semantic text and image information, and it can better express the content of images or video through the relation triple <subject, predicate, object>. This significant research can be applied to image question answering, video subtitles and other directions Using the video as input to the task of visual relationship detection receives less attention. Therefore, we propose an algorithm based on the graph convolution neural network and multi-hypothesis tree to implement video relationship prediction. Video visual relationship detection algorithm is divided into three steps: Firstly, the motion trajectories of the subject and object in the input video clip are generated; Secondly, a VRGE network module based on the graph convolution neural network is proposed to predict the relationship between objects in the video clip; Finally, the relationship triplets are formed through the multi-hypothesis fusion algorithm (MHF) and the visual relationship. We have verified our method on the benchmark ImageNet-VidVRD dataset. The experimental results demonstrate that our proposed method can achieve a satisfactory accuracy of 29.05% and recall of 10.18% for visual relation detection.
Keywords