ITM Web of Conferences (Jan 2017)

An Improved K-means Clustering Algorithm Applicable to Massive High-dimensional Matrix Datasets

  • Li Dong-Yuan,
  • Cao Cai-Feng

DOI
https://doi.org/10.1051/itmconf/20171104001
Journal volume & issue
Vol. 11
p. 04001

Abstract

Read online

Since K-means clustering algorithm is easy to implement and high efficient, it has been widely used in cluster analysis of massive datasets. The value of k is difficult to determine in advance and the randomness of choosing initial centers leads to a series of social problems, such as instability, local optimal solution sensitivity to outliers. Results from hierarchical clustering are more natural than those from K-means clustering, but its high time complexity and space complexity makes it difficult to be applied to a large data set. In this paper, through combination of hierarchical clustering and K-means clustering, we have proposed an improved K-means clustering algorithm, and have done experiments using datasets provided by MovieLens.