IEEE Access (Jan 2022)
Variable-Length Multivariate Time Series Classification Using ROCKET: A Case Study of Incident Detection
Abstract
Multivariate time series classification is a machine learning problem that can be applied to automate a wide range of real-world data analysis tasks. RandOm Convolutional KErnel Transform (ROCKET) proved to be an outstanding algorithm capable to classify time series accurately and quickly. The textbook variant of the multivariate time series classification problem assumes that time series to be classified are all of the same length, while in real-world applications this assumption not necessarily holds. The literature of this domain does not pay enough attention to data processing pipelines for variable-length time series. Thus, in this paper, we present a thorough analysis of three preprocessing pipelines that handle variable-length time series that need to be classified with a method that requires the data to be of equal length. These three methods are truncation, padding, and forecasting of missing value. Experiments conducted on benchmark datasets, showed that the recommended procedure involves padding. Forecasting ensures similar classification accuracy, but comes at a much higher computational cost. Truncation is not a viable option. Furthermore, in the paper, we present a novel domain of application of multivariate time series classification algorithms, that is incident detection in cash transactions. This area poses substantive challenges for automated model training procedures since the data is not only variable-length, but also heavily imbalanced. In the study, we list various incident types and present trained classifiers capable to aid human auditors in their daily work.
Keywords