IEEE Access (Jan 2020)

Online Clustering of Evolving Data Streams Using a Density Grid-Based Method

  • Mustafa Tareq,
  • Elankovan A. Sundararajan,
  • Masnizah Mohd,
  • Nor Samsiah Sani

DOI
https://doi.org/10.1109/ACCESS.2020.3021684
Journal volume & issue
Vol. 8
pp. 166472 – 166490

Abstract

Read online

In recent years, a significant boost in data availability for persistent data streams has been observed. These data streams are continually evolving, with the clusters frequently forming arbitrary shapes instead of regular shapes in the data space. This characteristic leads to an exponential increase in the processing time of traditional clustering algorithms for data streams. In this study, we propose a new online method, which is a density grid-based method for data stream clustering. The primary objectives of the density grid-based method are to reduce the number of distant function calls and to improve the cluster quality. The method is conducted entirely online and consists of two main phases. The first phase generates the Core Micro-Clusters (CMCs), and the second phase combines the CMCs into macro clusters. The grid-based method was utilized as an outlier buffer in order to handle multi-density data and noises. The method was tested on real and synthetic data streams employing different quality metrics and was compared with the popular method of clustering evolving data streams into arbitrary shapes. The proposed method was demonstrated to be an effective solution for reducing the number of calls to the distance function and improving the cluster quality.

Keywords