IET Computer Vision (Jun 2016)
Multiple deep features learning for object retrieval in surveillance videos
Abstract
Efficient indexing and retrieving objects of interest from large‐scale surveillance videos are a significant and challenging topic. In this study, the authors present an effective multiple deep features learning approach for object retrieval in surveillance videos. Based on the discriminative convolutional neural network (CNN), they can learn multiple deep features to comprehensively describe the visual object. To be specific, they utilise the CNN model pre‐trained on ImageNet ILSVRC12 and fine‐tuned on our dataset to abstract structure information. In addition, they train another CNN model supervised by 11 colour names to deliver the colour information. To improve the retrieval performance, the deep features are encoded into short binary codes by locality‐sensitive hash and fused to fast retrieve the object of interest. Retrieval experiments are performed on a dataset of 100k objects extracted from multi‐camera surveillance videos. Comparison results with other common visual features show the effectiveness of the proposed approach.
Keywords