Journal of King Saud University: Computer and Information Sciences (Mar 2020)

Query expansion based on term selection for Hindi – English cross lingual IR

  • Ganesh Chandra,
  • Sanjay K. Dwivedi

Journal volume & issue
Vol. 32, no. 3
pp. 310 – 319

Abstract

Read online

Retrieving accurate information from collection of information available on web in a cross-lingual communication environment is a very difficult task in our world. In order to retrieve information, user specifies the needed information in the form of query. Sometimes query may not be able to express the needed information in specific way due to ambiguity or un-translated query words. This problem can be minimized by expanding the query with other suitable words that make it more specific. Purpose of query expansion is to improve the performance and quality of retrieved information in CLIR. In this paper, Q.E. has been explored for a Hindi-English CLIR in which Hindi queries are used to search English documents. We used Okapi BM25 for documents ranking and then by using Term Selection Value (TSV) translated queries have been expanded. All experiments have been performed on FIRE 2012 dataset by analysing the impact of occurrence of terms in top @3 ranked documents. Our result shows that the relevancy of retrieved results of Hindi-English CLIR using Q.E. which is performed by adding a lowest frequency term from the corpus of top @3 ranked documents is 51.33%, which is higher than before and after Q.E. (i.e. Case1, Case2). Keywords: Okapi BM25, Term selection value (TSV), Query expansion, Information retrieval, Cross language information retrieval