Information (Jun 2019)
Multi-PQTable for Approximate Nearest-Neighbor Search
Abstract
Image retrieval or content-based image retrieval (CBIR) can be transformed into the calculation of the distance between image feature vectors. The closer the vectors are, the higher the image similarity will be. In the image retrieval system for large-scale dataset, the approximate nearest-neighbor (ANN) search can quickly obtain the top k images closest to the query image, which is the Top-k problem in the field of information retrieval. With the traditional ANN algorithms, such as KD-Tree, R-Tree, and M-Tree, when the dimension of the image feature vector increases, the computing time will increase exponentially due to the curse of dimensionality. In order to reduce the calculation time and improve the efficiency of image retrieval, we propose an ANN search algorithm based on the Product Quantization Table (PQTable). After quantizing and compressing the image feature vectors by the product quantization algorithm, we can construct the image index structure of the PQTable, which speeds up image retrieval. We also propose a multi-PQTable query strategy for ANN search. Besides, we generate several nearest-neighbor vectors for each sub-compressed vector of the query vector to reduce the failure rate and improve the recall in image retrieval. Through theoretical analysis and experimental verification, it is proved that the multi-PQTable query strategy and the generation of several nearest-neighbor vectors are greatly correct and efficient.
Keywords