Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

CHEN Yuan-yuan, WANG Zhi-hai

doi:10.11896/jsjkx.210600155

Jisuanji kexue (Jul 2022)

Concept Drift Detection Method for Multidimensional Data Stream Based on Clustering Partition

CHEN Yuan-yuan, WANG Zhi-hai

Affiliations

CHEN Yuan-yuan, WANG Zhi-hai: School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Laboratory of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China

DOI: https://doi.org/10.11896/jsjkx.210600155
Journal volume & issue: Vol. 49, no. 7
pp. 25 – 30

Abstract

Read online

The analysis and utilization of potential information in data stream is an important part of data stream mining.Concept drift is a huge challenge for data stream mining that the distribution of data will change with time.Detecting changes in data distribution is a direct and effective method to detect concept drift.Currently,some concept drift detection methods use the tree structure or grid to establish a histogram to describe the data distribution.However,the tree structure is easy to produce inspection blind spots and leads to poor interpretability.While using the grid method on multi-dimensional data,the memory consumption is too much.To solve the above problems,a concept drift detection method for multi-dimensional data streams called partition based on uniform density clusters(PUDC) is proposed.The algorithm is based on the k-Means algorithm to partition the data with uniform density and uses the chi-square test for statistics and calculation of each partition to detect the concept drift.To ve-rify the validity of the method,four artificial datasets and three real datasets were selected for experiments.The type I and type II error rates of different dimensions of data were compared and analyzed.Experimental results show that PUDC algorithm is superior to several new algorithms in concept drift detection of multi-dimensional data streams.

data stream mining|concept drift detection|<i>k</i>-means|hypothetical test|histogram

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords