Iranian Journal of Information Processing & Management (Dec 2018)
The Analysis of the Distribution and Focus of Keywords in Theses and Dissertations and Compliance with Descriptors, Title, and Abstract
Abstract
Index terms provided by authors and professional indexers are used in traditional information retrieval schemes. However, abstracts ideally contain the core message of a document. This can potentially give us the opportunities to use the abstracts to automatically extract index terms. This work is an effort to increase the accuracy of keyword extraction mechanism by adding a temporal weighting to candidate. In addition, this work can be used to research trend analysis and shows where the ongoing research is headed in Iranian Theses and Dissertations (TDs). To achieve the aforementioned objectives, we studied on more than 500 samples in different engineering research area from 50 different universities 1) the correlation between the authors and professional indexers keywords. We observed only 8% similarity between these two indices. 2) We studied the correlation between the index terms and words in abstract and title. We found that 40% of author keywords are extracted from first 20% of the abstract (This figure changes to 45% for professional indexer) and 24% from the second 20% (19% from the next 20%) This finding can be further used to narrow down the input dimensions for the various machine learning schemes for automatic keyword extraction. 3) Using some classification schemes it can be perceived that the most of the ongoing research in Iran is headed toward neural network and optimization.