International Journal of Information Science and Management (Jul 2023)
FarsAcademic: A Standard Persian Test Collection for Information Retrieval in Scientific Texts
Abstract
A significant amount of scientific texts is produced in Persian and available in scientific information databases through the Web. In this paper, FarsAcademic, a test collection of Persian scientific texts has been built for implementation of information retrieval models among academic search comprising 102238 documents and 61 topics. While constructing FarsAcademic, we have tried to resolve the problems specific to information retrieval (IR) and natural language processing (NLP) in Persian scientific texts. Domain experts were employed to create queries within their research area and user relevance and topical relevance were applied to improve the precision of relevance judgment of documents. Further, to improve retrieval performance in Persian scientific texts, automated query expansion was applied using one of the relevant feedback techniques named as Local Context Analysis algorithm. The result showed that query expansion techniques outperformed other information retrieval models in the Persian scientific texts retrieval task. Eventually, FarsAcademic is the only one that has been provided free of charge for Iranian information retrieval scholars for them to implement and evaluate different information retrieval models and algorithms on Persian scientific text and academic search.
Keywords