A Case-Based Recommender System for Persian Scientific Document Indexing

Azadeh Mohebi; Azadeh Fakhrzdaeh; Marzieh Zarinbal

doi:10.22034/jipm.2023.704737

Iranian Journal of Information Processing & Management (Dec 2023)

A Case-Based Recommender System for Persian Scientific Document Indexing

Azadeh Mohebi,
Azadeh Fakhrzdaeh,
Marzieh Zarinbal

Affiliations

Azadeh Mohebi: پژوهشگاه علوم و فناوری اطلاعات ایران (ایرانداک)
Azadeh Fakhrzdaeh: Iranian Research Institute for Information Science and Technology (IranDoc); Tehran, Iran
Marzieh Zarinbal: Iranian Research Institute for Information Science and Technology (IranDoc)

DOI: https://doi.org/10.22034/jipm.2023.704737
Journal volume & issue: Vol. 39, no. 2
pp. 599 – 626

Abstract

Read online

Keyword extraction is a key step in document indexing. Keywords are semantic and content-based descriptors of a document, which can be used in document retrieval and representation. In databases containing scientific documents, such as Ganj in Irannian Research Institue for Information Science and Technology (IranDoc), it is even more critical to assign meaningful keywords for documents, since the documents are from different academic disciplines and contain technical terms.As the number of scientific documents grows exponentially, having an automatic and intelligent keyword extraction technique is getting more critical. There are various keyword extraction techniques that are either based on statistical features of the text or machine learning approaches, and sometimes a combination of both. In this research, we propose a new keyword extraction method for Persian scientific documents based on recommender systems and case-based reasoning. The proposed method is designed based on case-based reasoning in which the main assumption is that similar documents share similar keywords. There are two main steps in the proposed approach: first, similar documents to a given new document are retrieved based on TFIDF and word2vec model, second, the candidate keywords are extracted from retrieved documents and ranked based on a new scoring scheme, and a set of keyword are selected from the candidate keywords based on their score. The proposed method is tested and avaluated on a set of documents of Ganj database in three different subject areas (Art, Humanities and Engineering), based on precision, recall and expert panel

Published in Iranian Journal of Information Processing & Management

ISSN: 2251-8223 (Print); 2251-8231 (Online)
Publisher: Iranian Research Institute for Information and Technology
Country of publisher: Iran, Islamic Republic of
LCC subjects: Bibliography. Library science. Information resources
Website: http://jipm.irandoc.ac.ir/index.php?slc_lang=en&sid=1

About the journal