IEEE Access (Jan 2017)
G-CNN: Object Detection via Grid Convolutional Neural Network
Abstract
We propose an object detection system that depends on position-sensitive grid feature maps. State-of-the-art object detection networks rely on convolutional neural networks pre-trained on a large auxiliary data set (e.g., ILSVRC 2012) designed for an image-level classification task. The image-level classification task favors translation invariance, while the object detection task needs localization representations that are translation variant to an extent. To address this dilemma, we construct position-sensitive convolutional layers, called grid convolutional layers that activate the object’s specific locations in the feature maps in the form of grids. With end-to-end training, the region of interesting grid pooling layer shepherds the last set of convolutional layers to learn specialized grid feature maps. Experiments on the PASCAL VOC 2007 data set show that our method outperforms the strong baselines faster region-based convolutional neural network counterpart and region-based fully convolutional networks by a large margin. Our method applied to ResNet-50 improves the mean average precision from 74.8%/74.2% to 79.4% without any other tricks. In addition, our approach achieves similar results on different networks (ResNet-101) and data sets (PASCAL VOC 2012 and MS COCO).
Keywords