IEEE Access (Jan 2024)
A Streaming Data Clustering Method Based on Dual Strategies Improved DENCLUE
Abstract
Streaming data arrives continually and is characterized by fast, massive, dynamic evolution and instability. Different from traditional static data clustering, streaming data clustering algorithms need to consider concept drift, outlier handling, identification and updating of dynamic clustering patterns, etc. DENCLUE is one of the most classical algorithms, which adopts nonparametric estimation and utilizes a finite number of samples to make inferences, to get the distribution of the overall data. However, the basic DENCLUE algorithm suffers from the problem that the Kernel Density Estimation (KDE) window width and density threshold parameter are difficult to choose, which cannot be directly applied to streaming data clustering. Therefore, in this paper, we propose a dual strategies improved DENCLUE streaming data clustering method based on KDE optimization and two-stage clustering, which takes into account the concept drift problem in streaming data. Firstly, a density threshold parameter optimization method based on KDE is proposed to address the challenges associated with selecting the KDE window width and density threshold in the traditional DENCLUE algorithm. Secondly, a two-stage clustering and merging method is designed to improve the performance of traditional DENCLUE clustering. The experimental results show that our algorithm outperforms the traditional Clustream and Denstream algorithms on datasets with arbitrary shapes and sizes, and has good performance on streaming data clustering.
Keywords