IEEE Access (Jan 2018)
Nonnegative Matrix Factorization Based Consensus for Clusterings With a Variable Number of Clusters
Abstract
Consensus clustering is an aggregation of base clusterings into an ensemble clustering which is better than the individual base clusterings. It is beneficial to determine the clusters from heterogeneous data. This paper presents a new approach that generates a set of good quality base clusterings and finds a single by aggregation of base clusterings into one clustering solution. The new approach consists of two phases. In the first phase, we present a new tree-based $k$ -means algorithm to build different base clusterings. It builds a cluster-tree which gives us one base clustering. The tree generation process uses two stopping criteria which base on the underlying data distribution of a data set. We change the value of the input parameter of the tree generation algorithm to produce multiple cluster-trees where each tree gives a base clustering with a variable number of clusters. In the second phase, we propose a new nonnegative matrix factorization-based consensus method to ensemble base clusterings into final clustering. We investigated the quality and diversity of base clusterings, which often have a large influence on the performances of consensus clustering. Experimental results on various real-world and synthetic data sets have demonstrated that the proposed algorithm was dominant over the well-known algorithms in term of clustering accuracy.
Keywords