Journal of Big Data (Sep 2020)

Learning in the presence of concept recurrence in data stream clustering

  • K. Namitha,
  • G. Santhosh Kumar

DOI
https://doi.org/10.1186/s40537-020-00354-1
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 28

Abstract

Read online

Abstract In the case of real-world data streams, the underlying data distribution will not be static; it is subject to variation over time, which is known as the primary reason for concept drift. Concept drift poses severe problems to the accuracy of a model in online learning scenarios. The recurring concept is a particular case of concept drift where the concepts already seen in the past reappear as the stream evolves. This problem is not yet studied in the context of stream clustering. This paper proposes a novel algorithm for identifying the recurring concepts in data stream clustering. During concept recurrence, the most matching model is retrieved from the repository and reused. The algorithm has minimum memory requirements and works online with the stream. Some of the concepts and definitions, already familiar in concept recurrence studies of stream classification have been redefined for clustering. The experiments conducted on real and synthetic data streams reveal that the proposed algorithm has the potential to identify recurring concepts.

Keywords