ITM Web of Conferences (Jan 2017)
An Improved K-means Clustering Algorithm Applicable to Massive High-dimensional Matrix Datasets
Abstract
Since K-means clustering algorithm is easy to implement and high efficient, it has been widely used in cluster analysis of massive datasets. The value of k is difficult to determine in advance and the randomness of choosing initial centers leads to a series of social problems, such as instability, local optimal solution sensitivity to outliers. Results from hierarchical clustering are more natural than those from K-means clustering, but its high time complexity and space complexity makes it difficult to be applied to a large data set. In this paper, through combination of hierarchical clustering and K-means clustering, we have proposed an improved K-means clustering algorithm, and have done experiments using datasets provided by MovieLens.