IEEE Access (Jan 2020)

A Semantic Based Approach for Topic Evaluation in Information Filtering

  • Yue Xu,
  • Hanh Nguyen,
  • Yuefeng Li

DOI
https://doi.org/10.1109/ACCESS.2020.2985079
Journal volume & issue
Vol. 8
pp. 66977 – 66988

Abstract

Read online

Topic Modelling has been successfully applied in many text mining applications such as natural language processing, information retrieval, information filtering, etc. In information filtering systems (IFs), user interest representation is the core part which determines the success of the system. Topics in a topic model generated from a user’s documents can be used to represent the user’s information interest. However, the quality of a topic model generated from a document collection is not always accurate because the topics of the topic model might contain meaningless or ambiguous words. This ambiguity problem can affect the performance of IFs which use a topic model to represent user information interest. Hence, a topic evaluation method to assess the quality of topics in a topic model is important for ensuring the effectiveness of utilizing the topic model in text mining applications. One method in measuring the quality of a topic model is to match the topical words of the model to concepts in an ontology. However, a limitation of this method is that some topical words in an examined topic cannot be found in the mapping ontology. In this study, we propose a new model to evaluate the quality of topics by matching concepts in an ontology. In particular, word embedding technique is applied to dealing with the ambiguity problem by finding similar concept words based on word embeddings. The assessed topics are then used in an information filtering system for filtering relevant documents for a user. The proposed model was evaluated against some state-of-the-art baseline models in terms of term-based, phrase-based, and topic-based user interest representations, and also some topic evaluation models. The result of the evaluation shows that the new proposed model outperforms the state-of-the-art baseline models

Keywords