大数据 (Nov 2021)
Algorithm of locality sensitive hashing bit selection based on feature selection
Abstract
Locality sensitive hashing is one of the most popular information retrieval methods, which needs to generate long hashing bits to meet the retrieval requirement.However, a long hashing bits requires huge storage space, and contains plenty of redundant hashing bits.In order to solve this problem, ten simple and efficient selection algorithms in feature engineering were adopted to extract the hashing bits which carry the largest amount of information from the long hashing bits which were generated by locality sensitive hashing, and the redundant and useless hash bits were removed.Those ten algorithms tried to capture the performance of each hashing bit or the correlation among bits, such as variance and hamming distance.During selection process, the useless or high-correlated hashing bits were removed.Then the selected hashing bits were compared with the original long hashing bits.The experimental results on four common datasets show that the selected hashing bits works as well as the original hashing bits, and their reduction ratio can reach from 30% to 70%.