IEEE Access (Jan 2018)

Entity-Based Language Model Smoothing Approach for Smart Search

  • Feng Zhao,
  • Zeliang Tian,
  • Hai Jin

DOI
https://doi.org/10.1109/ACCESS.2017.2788417
Journal volume & issue
Vol. 6
pp. 9991 – 10002

Abstract

Read online

Smart search plays an important role in all walks of life, for example, according to business needs, accurate search of required knowledge from massive resources is an important way to enhance industrial intelligence. Smoothing of the language model is essential for obtaining high-quality search results because it helps to reduce mismatching and overfitting problems caused by data sparseness. Traditional smoothing methods lexically focus on the global corpus and locally cluster documents information without semantic analysis, which leads to deficiency of the semantic correlations between query statements and documents. In this paper, we propose an entity-based language model smoothing approach for smart search that uses semantic correlation and takes entities as bridges to build the entity semantic language model using a knowledge base. In this approach, entities in the documents are linked to an external knowledge base, such as Wikipedia. Then, the entity semantic language model is generated by using soft-fused and hardfused methods. A two-level merging strategy is also presented to smooth the language model according to whether a given word is semantically relevant to the document or not, which integrates the Dir-smoothing and JM-smoothing methods. Experimental results show that the smoothed language model more closely approximates the word probability distribution under the document semantic theme and more accurately estimates the relevance between query and document.

Keywords