Study on Text Retrieval Based on Pre-training and Deep Hash

ZOU Ao, HAO Wen-ning, JIN Da-wei, CHEN Gang, TIAN Yuan

doi:10.11896/jsjkx.210300266

Jisuanji kexue (Nov 2021)

Study on Text Retrieval Based on Pre-training and Deep Hash

ZOU Ao, HAO Wen-ning, JIN Da-wei, CHEN Gang, TIAN Yuan

Affiliations

ZOU Ao, HAO Wen-ning, JIN Da-wei, CHEN Gang, TIAN Yuan: Command Control Engineering College,Army Engineering University of PLA,Nanjing 210000,China

DOI: https://doi.org/10.11896/jsjkx.210300266
Journal volume & issue: Vol. 48, no. 11
pp. 300 – 306

Abstract

Read online

Aiming at the problem of low retrieval efficiency and accuracy in text retrieval,a retrieval model based on pre-trained language model and deep hash method is proposed.Firstly,the prior knowledge of text contained in the pre-trained language model is introduced by transfer learning,and then the input is transformed into high-dimensional vector representation by feature extraction.A hash learning layer is added to the back end of the whole model to fine tune the parameters of the model by designing specific optimization objectives,so as to dynamically learn the hash function and the unique hash representation of each input in the training.Experimental results show that the retrieval accuracy of this method is at least 21.70% and 21.38% higher than that of other benchmark models in top-5 and top-10,respectively.The introduction of hash code makes the model improve the retrieval speed by 40 times under the premise of only losing 4.78% accuracy.Therefore,this method can significantly improve the retrieval accuracy and efficiency,and has a potential application prospect in the field of text retrieval.

deep learning|similarity retrieval|pre-trained language model|deep hash

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords