IEEE Access (Jan 2019)

Semantic Weighted Multi-View Clustering for Web Content

  • Xiaolong Gong,
  • Linpeng Huang,
  • Tiancheng Luo,
  • Zhiyi Ma

DOI
https://doi.org/10.1109/ACCESS.2019.2939334
Journal volume & issue
Vol. 7
pp. 128097 – 128113

Abstract

Read online

Clustering is a long-standing important research problem. However, it remains challenging when handling large-scale web data from different types of information resources such as user profile, comments, user preferences and so on. All these aspects can be seen as different views and often admit the same underlying clustering of the data. In this paper, we present a novel Semantic Weighted Non-negative Matrix Factorization ( $SWNMF$ ) multi-view clustering framework, which can provide an efficient weighted matrix factorization framework, dexterously manipulate multi-view web content, and easily explore the sparseness problem in semantic space of data. Specifically, each view of dataset forming a huge sparse matrix, which results in the non-robust characteristic during the matrix decomposition process, and further influences the accuracy of clustering results. To address above problem, we attempt to use some preference information (e.g. rating values) given by the users as latent semantic information to handle those features that are unobserved in each data point so as to resolve the sparseness problem in all views matrices. To combine multiple views in our large corpus, the overall objective of our proposed $SWNMF$ is to minimize the loss function of weighted non-negative matrix factorization (NMF) under the $l_{2,1}$ -norm and the co-regularized constraint under the $F$ -norm. Extensive experiments on our large-scale multi-view web datasets demonstrate the competitive performance of our solution.

Keywords