AFFINITY PROPAGATION AND K-MEANS ALGORITHM FOR DOCUMENT CLUSTERING BASED ON SEMANTIC SIMILARITY

Avan A. Mustafa; Karwan Jacksi

doi:10.25271/sjuoz.2023.11.2.1148

Science Journal of University of Zakho (Apr 2023)

AFFINITY PROPAGATION AND K-MEANS ALGORITHM FOR DOCUMENT CLUSTERING BASED ON SEMANTIC SIMILARITY

Avan A. Mustafa,
Karwan Jacksi

Affiliations

Avan A. Mustafa: Computer Science Dept. University of Duhok, Duhok, Iraq
Karwan Jacksi: Computer Science Dept., University of Zakho, Zakho, Iraq

DOI: https://doi.org/10.25271/sjuoz.2023.11.2.1148
Journal volume & issue: Vol. 11, no. 2

Abstract

Read online

Clustering text documents is the process of dividing textual material into groups or clusters. Due to the large volume of text documents in electronic forms that have been made with the development of internet technology, document clustering has gained considerable attention. Data mining methods for grouping these texts into meaningful clusters are becoming a critical method. Clustering is a branch of data mining that is a blind process used to group data by a similarity known as a cluster. However, the clustering should be based on semantic similarity rather than using syntactic notions, which means the documents should be clustered according to their meaning rather than keywords. This article presents a novel strategy for categorizing articles based on semantic similarity. This is achieved by extracting document descriptions from the IMDB and Wikipedia databases. The vector space is then formed using TFIDF, and clustering is accomplished using the Affinity propagation and K-means methods. The findings are computed and presented on an interactive website.

Published in Science Journal of University of Zakho

ISSN: 2663-628X (Print); 2663-6298 (Online)
Publisher: University of Zakho
Country of publisher: Iraq
LCC subjects: Science
Website: https://sjuoz.uoz.edu.krd/

About the journal

Abstract

Keywords