International Journal of Information and Communication Technology Research (Sep 2022)

Improving Content-Based Recommender System For Clustering Documents Based on Ontology And New Hierarchical Clustering Method

  • Maryam Hourali,
  • Mansoureh Hourali

Journal volume & issue
Vol. 14, no. 3
pp. 37 – 47

Abstract

Read online

Today we live in a period that is known to an area of communication. By increasing the information on the internet, the extra news are published on news agencies websites or other resources, the users are confused more with the problems of finding their desired information and related news. Among these are recommended systems they can automatically finding the news and information of their favorite’s users and suggesting to them too. This article attempts to improve the user’s interests and user’s satisfactions by refining the content based recommendation system to suggest better sources to their users. A clustering approach has been used to carry out this improvement. An attempt has been made to define a cluster threshold for clustering the same news and information in the K-means clustering algorithm. By detecting best resemblance criterion value and using an external knowledge base (ontology), we could generalize words into a set of related words (instead of using them alone). This approach is promoted the accuracy of news clustering and use the provided cluster to find user’s favorite news and also could have suggest the news to the user. Since the dataset has an important and influential role in advisory recommended systems, the standard Persian dataset is not provided and not published yet. In this research, an attempted has been made to connect and publish the dataset to finish the effect of this vacuum. The data are collected and crawl 8 periods of days from the Tabnak news agency website. The profile of each volunteers has been created and also saved at the same time as they read the favorite news on that period of time. An analysis shows that the proposed clustering approach provided by the NMI criterion has reached 70.2% on our the dataset. Also, using the suggested clustering recommendation system yield 89.2% performance based on the accuracy criterion, which shows an improvement of 8.5% in a standardized way.

Keywords