Dianzi Jishu Yingyong (May 2018)

Research on parallelization clustering algorithm for power communication big data

  • Zeng Ying,
  • Li Xingnan,
  • Liu Xinzhan

DOI
https://doi.org/10.16157/j.issn.0258-7998.180780
Journal volume & issue
Vol. 44, no. 5
pp. 1 – 4

Abstract

Read online

With the development of power communication technology, a large number of distributed power communication subsystems and massive power communication data have been generated. It is important to mine important information in the vast amounts of data. Cluster analysis, as an effective means of data processing and information mining, has been widely used in power communication. However, the traditional clustering algorithms can not meet the time performance requirements when dealing with massive power data. To solve this problem, a parallel k-medoids clustering algorithm based on MapReduce model is proposed to support the effective analysis and utilization of power data. The algorithm uses density-based clustering method to optimize the selection strategy of k-medoids initial point, and implements the algorithm parallelization using MapReduce programming framework under Hadoop platform. Experimental results show that compared with other algorithms, the improved parallel algorithm reduces the clustering time and improves the clustering accuracy.

Keywords