IEEE Access (Jan 2022)

Feature Discrimination of News Based on Canopy and KMGC-Search Clustering

  • Mobeen Shahroz,
  • Muhammad Faheem Mushtaq,
  • Rizwan Majeed,
  • Ali Samad,
  • Zaigham Mushtaq,
  • Urooj Akram

DOI
https://doi.org/10.1109/ACCESS.2022.3152159
Journal volume & issue
Vol. 10
pp. 26307 – 26319

Abstract

Read online

The internet provides a very vast amount of sources of news and the user has to search for desirable news by spending a lot of time because the user always prefers their related interest, desirable and informative news. The clustering of the news article having a great impact on the preferences of the user. The unsupervised learning techniques such that K-means Clustering and Spectral Clustering are proposed to categorize the news articles by extracting discriminant features that help the user to search and get informative news without wasting time. The BBC news articles dataset is used to perform experiments that consist of 2225 news articles. The TF-IDF feature extraction technique is used with K-means clustering and Spectral clustering to get the most similar clusters to categorize the news articles in respective domains. Those domains are sports, tech, entertainment, politics, and business. The clustering algorithms are evaluated using adjusted rand index, V-measure, homogeneity score, completeness score, and Fowlkes mallows score. The experimental results illustrated that K-means clustering performs better than spectral clustering using the TF-IDF feature extraction approach. But to improve the results the canopy centroid selection is used with the grid search optimization technique to optimize the results of the Kmeans and named its as a K-Means using Grid Search based on Canopy (KMGC-Search). The experimental results shows the proposed approach can be used as a viable method for the categorization of news articles.

Keywords