Jisuanji kexue yu tansuo (Sep 2021)
Survey of Video Object Detection Based on Deep Learning
Abstract
Video object detection is to solve the problem of object localization and recognition in every video frame. Compared with image object detection, video is featured by high redundancy, which contains a lot of local spatio-temporal information. With the rapid popularity of deep convolutional neural network in the field of static image object detection, it shows a great advantage over traditional methods in performance. Besides, it plays a due role in video-based object detection task. However, the current video object detection algorithms still face many challenges, such as improving and optimizing the performance of mainstream object detection algorithms, maintaining the spatiotemporal consistency of video sequences, and making detection of model lightweight. In view of the above problems and challenges, on the basis of investigating a large number of literature, this paper systematically sum-marizes the video object detection algorithm based on deep learning. Based on the basic methods like optical flow and detection, these algorithms are classified. In addition, in the angles of backbone network, algorithm structure and data sets etc., these methods are explored. Combined with the experimental results in the ImageNet VID data set, this paper analyzes the performance advantages and disadvantages of typical algorithms of this field, and the relations between these algorithms. As for video object detection, the problems to be solved as well as the future research direction are expounded and prospected. Video object detection has become a hot spot pursued by many computer vision scholars. More efficient and accurate algorithms will be proposed, and its development direction will be better and better.
Keywords