An Improved K-means Clustering Algorithm Applicable to Massive High-dimensional Matrix Datasets

Li Dong-Yuan; Cao Cai-Feng

doi:10.1051/itmconf/20171104001

ITM Web of Conferences (Jan 2017)

An Improved K-means Clustering Algorithm Applicable to Massive High-dimensional Matrix Datasets

Li Dong-Yuan,
Cao Cai-Feng

Affiliations

Li Dong-Yuan: Wuyi University, School of Computer
Cao Cai-Feng: Wuyi University, School of Computer

DOI: https://doi.org/10.1051/itmconf/20171104001
Journal volume & issue: Vol. 11
p. 04001

Abstract

Read online

Since K-means clustering algorithm is easy to implement and high efficient, it has been widely used in cluster analysis of massive datasets. The value of k is difficult to determine in advance and the randomness of choosing initial centers leads to a series of social problems, such as instability, local optimal solution sensitivity to outliers. Results from hierarchical clustering are more natural than those from K-means clustering, but its high time complexity and space complexity makes it difficult to be applied to a large data set. In this paper, through combination of hierarchical clustering and K-means clustering, we have proposed an improved K-means clustering algorithm, and have done experiments using datasets provided by MovieLens.

Published in ITM Web of Conferences

ISSN: 2271-2097 (Online)
Publisher: EDP Sciences
Country of publisher: France
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.itm-conferences.org/

About the journal