Engineering Reports (Nov 2022)
A set of time‐series classification datasets based on the average price of concrete in major Chinese cities
Abstract
Abstract Time series classification (TSC) is an important and challenging problem in data mining. Time series data sets are an important basis for this research and are widely used in baseline verification of various algorithm models. Aiming at the problem that there are few domestic data sets and the current TSC data set is relatively old, a new set of time‐series classification datasets are established based on the average price data of concrete in major cities in China, which provides new data support for the research of TSC algorithm. We made use of the data center of Oriental Fortune to disclose the sample data of the average price of concrete from October 23, 2013 to January 20, 2021, created 1093 autoregression‐based series data sets by using sliding windows of different lengths, and then through the experimental verification of linear regression model, decision tree model, and random forest model, a set of data sets with lengths of 170, 842, and 1052 were selected. We use three convolutional neural network (CNN) models and three long short‐term memory (LSTM) network models to verify the validity of the data, the CNN's obtained the best accuracy rate of 93.20%, and the LSTM networks obtained the best accuracy rate of 92.99%. The establishment of the new data set has certain significance for the research of TSC and provides some references for other researchers to create datasets. The datasets are freely available at https://gitee.com/lq2012/tsc‐dataset.
Keywords