Effective Learning to Rank Persian Web Content

Amir Hosein Keyhanipour

doi:10.22059/jitm.2019.284726.2377

Journal of Information Technology Management (Jun 2019)

Effective Learning to Rank Persian Web Content

Amir Hosein Keyhanipour

Affiliations

Amir Hosein Keyhanipour: Assistant Professor, Computer Engineering Department, Faculty of Engineering, College of Farabi, University of Tehran, Iran.

DOI: https://doi.org/10.22059/jitm.2019.284726.2377
Journal volume & issue: Vol. 11, no. 2
pp. 111 – 128

Abstract

Read online

Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a recently proposed learning to rank data, aims to deal with such issues by the classifier fusion idea. CF-Rank generates a few click-through features, which provide a compact representation of a given primitive dataset. By constructing the primitive classifiers on each category of click-through features and aggregating their decisions by the use of information fusion techniques, CF-Rank has become a successful ranking algorithm in English datasets. In this paper, CF-Rank is customized for the Persian Web content. Evaluation results of this algorithm on the dotIR dataset indicate that the customized CF-Rank outperforms baseline rankings. Especially, the improvement is more noticeable at the top of ranked lists, which are observed most of the time by the Web users. According to the NDCG@1 and MAP evaluation criteria, comparing the CF-Rank with the preeminent baseline algorithm on the dotIR dataset indicates an improvement of 30 percent and 16.5 percent, respectively.

Published in Journal of Information Technology Management

ISSN: 2008-5893 (Print); 2423-5059 (Online)
Publisher: University of Tehran
Country of publisher: Iran, Islamic Republic of
LCC subjects: Bibliography. Library science. Information resources: Information resources (General)
Website: https://jitm.ut.ac.ir/

About the journal

Abstract

Keywords