Al-Rafidain Journal of Computer Sciences and Mathematics (May 2020)

HPPD: A Hybrid Parallel Framework of Partition-based and Density-based Clustering Algorithms in Data Streams

  • Ammar Abd Alazeez

DOI
https://doi.org/10.33899/csmj.2020.164677
Journal volume & issue
Vol. 14, no. 1
pp. 67 – 82

Abstract

Read online

Data stream clustering refers to the process of grouping continuously arriving new data chunks into continuously changing groups to enable dynamic analysis of segmentation patterns. However, the main attention of research on clustering methods till now has been concerned with alteration of the methods updated for static datasets and changes of the available modified methods. Such methods presented only one type of final output clusters, i.e. convex or non-convex shape clusters. This paper presents a novel two-phase parallel hybrid clustering (HPPD) algorithm that identify convex and non-convex groups in online stage and mixed groups in offline stage from data stream. In this work, we first receive the data stream and apply pre-processing step to identify convex and non-convex clusters. Secondly, apply modified EINCKM to present online output convex clusters and modified EDDS to present online output non-convex clusters in parallel scheme. Thirdly, apply adaptive merging strategy in offline stage to give last composed output groups. The method is assessed on a synthetic dataset. The output results of the experiments have authenticate the activeness and effectiveness of the method.

Keywords