The Analysis of the Distribution and Focus of Keywords in Theses and Dissertations and Compliance with Descriptors, Title, and Abstract

Ashkan Khatir; Soheil Ganjefar

Iranian Journal of Information Processing & Management (Dec 2018)

The Analysis of the Distribution and Focus of Keywords in Theses and Dissertations and Compliance with Descriptors, Title, and Abstract

Ashkan Khatir,
Soheil Ganjefar

Affiliations

Ashkan Khatir: Iranian Research Institute for Information Science and Technology (IranDoc); Iran
Soheil Ganjefar: Iranian Research Institute for Information Science and Technology (IranDoc); Professor of Electrical Engineering; Department of Electrical Engineering; Faculty of Engineering; Bu-Ali Sina University; Iran;

Journal volume & issue: Vol. 34, no. 1
pp. 411 – 428

Abstract

Read online

Index terms provided by authors and professional indexers are used in traditional information retrieval schemes. However, abstracts ideally contain the core message of a document. This can potentially give us the opportunities to use the abstracts to automatically extract index terms. This work is an effort to increase the accuracy of keyword extraction mechanism by adding a temporal weighting to candidate. In addition, this work can be used to research trend analysis and shows where the ongoing research is headed in Iranian Theses and Dissertations (TDs). To achieve the aforementioned objectives, we studied on more than 500 samples in different engineering research area from 50 different universities 1) the correlation between the authors and professional indexers keywords. We observed only 8% similarity between these two indices. 2) We studied the correlation between the index terms and words in abstract and title. We found that 40% of author keywords are extracted from first 20% of the abstract (This figure changes to 45% for professional indexer) and 24% from the second 20% (19% from the next 20%) This finding can be further used to narrow down the input dimensions for the various machine learning schemes for automatic keyword extraction. 3) Using some classification schemes it can be perceived that the most of the ongoing research in Iran is headed toward neural network and optimization.

Published in Iranian Journal of Information Processing & Management

ISSN: 2251-8223 (Print); 2251-8231 (Online)
Publisher: Iranian Research Institute for Information and Technology
Country of publisher: Iran, Islamic Republic of
LCC subjects: Bibliography. Library science. Information resources
Website: http://jipm.irandoc.ac.ir/index.php?slc_lang=en&sid=1

About the journal