Heliyon (Apr 2024)
Developing a multivariate time series forecasting framework based on stacked autoencoders and multi-phase feature
Abstract
Time series forecasting across different domains has received massive attention as it eases intelligent decision-making activities. Recurrent neural networks and various deep learning algorithms have been applied to modeling and forecasting multivariate time series data. Due to intricate non-linear patterns and significant variations in the randomness of characteristics across various categories of real-world time series data, achieving effectiveness and robustness simultaneously poses a considerable challenge for specific deep-learning models. We have proposed a novel prediction framework with a multi-phase feature selection technique, a long short-term memory-based autoencoder, and a temporal convolution-based autoencoder to fill this gap. The multi-phase feature selection is applied to retrieve the optimal feature selection and optimal lag window length for different features. Moreover, the customized stacked autoencoder strategy is employed in the model. The first autoencoder is used to resolve the random weight initialization problem. Additionally, the second autoencoder models the temporal relation between non-linear correlated features with convolution networks and recurrent neural networks.Finally, the model's ability to generalize, predict accurately, and perform effectively is validated through experimentation with three distinct real-world time series datasets. In this study, we conducted experiments on three real-world datasets: Energy Appliances, Beijing PM2.5 Concentration, and Solar Radiation. The Energy Appliances dataset consists of 29 attributes with a training size of 15,464 instances and a testing size of 4239 instances. For the Beijing PM2.5 Concentration dataset, there are 18 attributes, with 34,952 instances in the training set and 8760 instances in the testing set. The Solar Radiation dataset comprises 11 attributes, with 22,857 instances in the training set and 9797 instances in the testing set. The experimental setup involved evaluating the performance of forecasting models using two distinct error measures: root mean square error and mean absolute error. To ensure robust evaluation, the errors were calculated at the identical scale of the data. The results of the experiments demonstrate the superiority of the proposed model compared to existing models, as evidenced by significant advantages in various metrics such as mean squared error and mean absolute error. For PM2.5 air quality data, the proposed model's mean absolute error is 7.51 over 12.45, about ∼40% improvement. Similarly, the mean square error for the dataset is improved from 23.75 to 11.62, which is ∼51%of improvement. For the solar radiation dataset, the proposed model resulted in ∼34.7% improvement in means squared error and ∼75% in mean absolute error. The recommended framework demonstrates outstanding capabilities in generalization and outperforms datasets spanning multiple indigenous domains.