IEEE Access (Jan 2015)
Comparison of Data Set Bias in Object Recognition Benchmarks
Abstract
Current research in the area of automatic visual object recognition heavily relies on testing the performance of new algorithms by using benchmark data sets. Such data sets can be based on standardized data sets collected systematically in a controlled environment (e.g., COIL-20), as well as benchmarks compiled by collecting images from various sources, normally via the World Wide Web (e.g., Caltech 101). Here, we test bias in benchmark data sets by separating a small area from each image such that the area is seemingly blank, and too small to allow manual recognition of the object. The method can be used to detect the existence of data set bias in a single-object recognition data set, and compare the bias to other data sets. The results show that all the tested data sets allowed classification accuracy higher than mere chance by using the small images, although the sub-images did not contain any visually interpretable information. That shows that the consistency of the images within the different classes of object recognition data sets can allow classifying the images even by algorithms that do not recognize objects. Among the tested data sets, PASCAL is the data set with the lowest observed bias, while data sets acquired in a controlled environment, such as COIL-20, COIL-100, and NEC Animals, are more vulnerable to bias, and can be classified by the sub-images with accuracy far higher than mere chance.
Keywords