Heliyon (Nov 2022)
Spatiotemporal prediction of O3 concentration based on the KNN-Prophet-LSTM model
Abstract
In this paper, a prediction method based on the KNN-Prophet-LSTM hybrid model is established by using the daily pollutant concentration data of Wuhan from January 1, 2014, to May 3, 2021, and considering the characteristics of time and space. First, the data are divided into trend items, periodic items and error items by the Prophet decomposition method. Considering the advantages of the Prophet and the Long Short-Term Memory (LSTM) models, the trend items and periodic items are predicted by the Prophet model. The LSTM model is used to predict the error terms, and the K-Nearest Neighbor algorithm (KNN) is added to fuse the spatial and temporal information to predict the ozone (O3) concentration value day by day. To highlight the effectiveness and rationality of the KNN-Prophet-LSTM hybrid model, four groups of comparative experiments are set up to compare it with the single model Autoregressive Integrated Moving Average (ARIMA), Prophet, LSTM and the hybrid model Prophet-LSTM. The experimental results show that, (1) the daily maximum 8-hour average concentration of O3 in Wuhan has a significant periodic variation. The difference in the surrounding environment will lead to the difference in O3 concentration change in the region, and the O3 concentration change of similar stations will have a high similarity. (2) The Prophet decomposition algorithm decomposes the original time series, which can effectively extract the time series information and remove noise. Thus, the prediction accuracy is obviously improved. (3) Considering the spatial information of the surrounding sites by KNN algorithm, the accuracy of the model can be further improved. Compared with the baseline model ARIMA, the accuracy is improved by approximately 49.76% on mean absolute error (MAE) and 46.81% on root mean square error (RMSE) respectively. (4) The prediction effect of the mixed model is generally better than that of the single model and possesses a higher prediction accuracy.