k-Means Clustering Algorithm and Its Simulation Based on Distributed Computing Platform

Chunqiong Wu; Bingwen Yan; Rongrui Yu; Baoqin Yu; Xiukao Zhou; Yanliang Yu; Na Chen

doi:10.1155/2021/9446653

Complexity (Jan 2021)

k-Means Clustering Algorithm and Its Simulation Based on Distributed Computing Platform

Chunqiong Wu,
Bingwen Yan,
Rongrui Yu,
Baoqin Yu,
Xiukao Zhou,
Yanliang Yu,
Na Chen

Affiliations

Chunqiong Wu: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Bingwen Yan: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Rongrui Yu: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Baoqin Yu: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Xiukao Zhou: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Yanliang Yu: Business College, Yango University, Fuzhou, Fujian Province 350015, China
Na Chen: Big Data Business Intelligence Engineering Research Center, Fujian University, Fuzhou, Fujian Province 350015, China

DOI: https://doi.org/10.1155/2021/9446653
Journal volume & issue: Vol. 2021

Abstract

Read online

At present, the explosive growth of data and the mass storage state have brought many problems such as computational complexity and insufficient computational power to clustering research. The distributed computing platform through load balancing dynamically configures a large number of virtual computing resources, effectively breaking through the bottleneck of time and energy consumption, and embodies its unique advantages in massive data mining. This paper studies the parallel k-means extensively. This article first initializes random sampling and second parallelizes the distance calculation process that provides independence between the data objects to perform cluster analysis in parallel. After the parallel processing of the MapReduce, we use many nodes to calculate distance, which speeds up the efficiency of the algorithm. Finally, the clustering of data objects is parallelized. Results show that our method can provide services efficiently and stably and have good convergence.

Published in Complexity

ISSN: 1076-2787 (Print); 1099-0526 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/8503

About the journal