IEEE Access (Jan 2020)
Two-Phase Multivariate Time Series Clustering to Classify Urban Rail Transit Stations
Abstract
Consider the problem of clustering objects with temporally changing multivariate variables, for instance, in the classification of cities with several changing socioeconomic indices in geographical research. If the changing multivariate can be recorded simultaneously as a multivariate time series, in which the length of each subseries is equal and the subseries can be correlated, the problem is transformed into a multivariate time series clustering problem. The available methods consider the correlations between distinct time series but overlook the shape of each time series, which causes multivariate time series with similar correlations and opposite shapes to be clustered into the same class. To overcome this problem, this paper proposes a two-phase multivariate time series clustering algorithm that considers both correlation and shape. In Phase I, the discrete wavelet transform is applied to capture the wavelet variances and the correlation coefficients between each pair of variables to realize the initial clustering of multivariate time series, where time series with a similar correlation but opposite shape may be assigned to the same cluster. In Phase II, multivariate time series are clustered based on shape via the symbolic aggregate approximation (SAX) method. In this phase, time series with similar correlations but opposite morphologies are differentiated. The method is evaluated using multivariate time series of incoming and outgoing passenger volumes from Beijing IC card data; these volume data were collected between March 4, 2013 and March 17, 2013. Based on the silhouette coefficient, our approach outperforms two popular multivariate time series clustering methods: a wavelet-based method and the SAX method.
Keywords