IEEE Access (Jan 2024)
PICTURE—A Framework to Assess the Degree of Approximation of Summarized Time Series
Abstract
The analysis of time series data, which represents dynamic phenomena through sequences of observations, is greatly influenced by Big Data. Both the sheer volume and the advanced capabilities of Big Data significantly impact on how these analyses are conducted, enabling more comprehensive and detailed insights. Recent studies have promoted the use of data summarization techniques, for instance through incremental clustering, to address the challenges of Big Data volume. These techniques quickly capture data evolution, thereby helping domain experts make informed and proactive decisions by leveraging a concise representation of time series. However, although incremental clustering efficiently reduces data volume and retains key statistical information, it is important to evaluate the accuracy of the summarized version compared to the original time series data. This assessment is critical when the summarized data is used as the basis for complex analytical pipelines, such as those for pattern recognition and anomaly detection. Moved by these premises and starting from an empirical experience on the definition of a metric to assess the adherence of summarised time series to the original data stream, in this paper: (i) we propose a variant of a renowned quality metric for incremental clustering based on an abstract model of clustering data structures, to assess the extent to which the time series summary accurately captures the dynamics of the original data; (ii) we present PICTURE (Python-based Incremental Clustering for Time series Representation and Evaluation) a framework featuring four widely used incremental clustering algorithms from the literature, equipped with modules for execution, representation, and evaluation of clustering results applied to time series according to the abstract model; (iii) we conduct an extensive qualitative and quantitative analysis of incremental clustering results on a synthetic and two real-world datasets using the PICTURE framework, to demonstrate the effectiveness of the proposed metric in assessing the degree of approximation of summarised time series.
Keywords