Jisuanji kexue yu tansuo (May 2024)
Clustering Multivariate Time Series Data Based on Shape Extraction with Compactness Constraint
Abstract
Aiming at the naturalness and structural complexity of multivariate time series (MTS) data as well as the inability of existing algorithms to accurately identify clusters of high-dimensional time series data, the shape extraction multivariate time series clustering algorithm C-Shape under compactness constraints is proposed. Firstly, C-Shape performs largest triangle three buckets processing on the complex MTS to achieve the purpose of using fewer data while keeping the original shape unchanged. The raw data and the processed data are then selected to calculate the compactness between them to ensure the reduced spatial dimensionality is reasonable. Next, new cluster centers are obtained by using shape extraction while effectively preserving the shape integrity of the data, and the final cluster is formed by iteration. C-Shape can avoid the difficulty of grasping the low dimensional spatial dimensionality of the traditional down-sampling algorithm by fully taking into account the similarity between the shapes of the processed data and raw data. To validate its performance, C-Shape is tested with two classical and seven excellent time series clustering algorithms presented in recent years on the eight normal and four imbalanced MTS datasets with dimensions ranging from tens to thousands, respectively. Experimental results demonstrate all C-Shape clustering capabilities outperform those of the nine baseline algorithms, with an average improvement of 16.33% in Rand index and an average improvement of 69.71% in time performance. Thus C-Shape is an accurate and efficient multivariate time series clustering algorithm.
Keywords