Jisuanji kexue (Jul 2022)

Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

  • CHEN Yuan-yuan, WANG Zhi-hai

DOI
https://doi.org/10.11896/jsjkx.210600155
Journal volume & issue
Vol. 49, no. 7
pp. 25 – 30

Abstract

Read online

The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the k-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.

Keywords