IEEE Access (Jan 2021)
Pairwise Context Similarity for Image Retrieval System Using Variational Auto-Encoder
Abstract
Deep-learning-to-hash models have recently achieved several breakthroughs enabling a fast and efficient image retrieval system. As supervision for deep-learning-to-hash models, pairwise label similarity which considers two images to be identical if their labels are identical plays a crucial role. However, models using only pairwise label similarity cannot incorporate rich contextual information in images because pairwise label similarity solely depends on labels. In this paper, we initially address two major limitations of using the pairwise label similarity as only supervision for the deep-learning-to-hash model. Then, we propose a novel pairwise context similarity to alleviate those limitations. The proposed pairwise context similarity is computed on the latent space of a Variational Auto-Encoder which is trained in an unsupervised fashion that does not utilize any label information. Moreover we propose the strategy of an auxiliary loss for deep-learning-to-hash models that can easily be combined with previous losses using pairwise label similarity without deteriorating the retrieval quality. In our experiments on three standard benchmark datasets, our proposed method achieved high retrieval quality for image retrieval tasks while also showing advantages with regard to the addressed limitations. Also, we empirically prove that our proposed method acts as a proper regularization term during training so that our loss term therefore helps to mitigate overfitting and stabilizes the training curves.
Keywords