IEEE Access (Jan 2021)
Deep Semantic Hashing Using Pairwise Labels
Abstract
Data hashing has been widely used to approximate large-scale similarity searches. Original text data can be represented using compact binary codes through hashing. Recent advances in neural network architecture have demonstrated the effectiveness of this method and its ability to learn hash functions more accurately. Most previous studies have been focused on encoding explicit supervised features, such as pointwise labels. Owing to the special nature of textual data, previous semantic text hashing approaches have only utilized pointwise label information. The purpose of the learning hash code developed in the present study is to make similar or related text have similar hash codes. Separate label learning for each datum is the easiest means of achieving this objective, but some inconsistencies remain. However, pairwise label information reflects the similarity more intuitively than pointwise label data. This paper proposes a supervised semantic text hashing method that utilizes pairwise label information. Several different methods based on the variational auto-encoder model are employed to calculate the pairwise similarity of text pairs. Because the similarity calculation process does not require additional parameters, the entire learning process is faster and more efficient than those in the existing methods. The experimental results obtained using public datasets show that the proposed method can exploit pairwise label information sufficiently well to outperform previous state-of-the-art hashing approaches. This report also describes variants involving different technique combinations, presents analyses of the efficiencies of these approaches, and discusses methods of improving their efficiencies.
Keywords