Atmosphere (Jun 2023)

PM<sub>2.5</sub> Concentration Forecasting Using Weighted Bi-LSTM and Random Forest Feature Importance-Based Feature Selection

  • Baekcheon Kim,
  • Eunkyeong Kim,
  • Seunghwan Jung,
  • Minseok Kim,
  • Jinyong Kim,
  • Sungshin Kim

DOI
https://doi.org/10.3390/atmos14060968
Journal volume & issue
Vol. 14, no. 6
p. 968

Abstract

Read online

Particulate matter (PM) in the air can cause various health problems and diseases in humans. In particular, the smaller size of PM2.5 enable them to penetrate deep into the lungs, causing severe health impacts. Exposure to PM2.5 can result in respiratory, cardiovascular, and allergic diseases, and prolonged exposure has also been linked to an increased risk of cancer, including lung cancer. Therefore, forecasting the PM2.5 concentration in the surrounding is crucial for preventing these adverse health effects. This paper proposes a method for forecasting the PM2.5 concentration after 1 h using bidirectional long short-term memory (Bi-LSTM). The proposed method involves selecting input variables based on the feature importance calculated by random forest, classifying the data to assign weight variables to reduce bias, and forecasting the PM2.5 concentration using Bi-LSTM. To compare the performance of the proposed method, two case studies were conducted. First, a comparison of forecasting performance according to preprocessing. Second, forecasting performance between deep learning (long short-term memory, gated recurrent unit, and Bi-LSTM) and conventional machine learning models (multi-layer perceptron, support vector machine, decision tree, and random forest). In case study 1, The proposed method shows that the performance indices (RMSE: 3.98%p, MAE: 5.87%p, RRMSE: 3.96%p, and R2:0.72%p) are improved because weights are given according to the input variables before the forecasting is performed. In case study 2, we show that Bi-LSTM, which considers both directions (forward and backward), can effectively forecast when compared to conventional models (RMSE: 2.70, MAE: 0.84, RRMSE: 1.97, R2: 0.16). Therefore, it is shown that the proposed method can effectively forecast PM2.5 even if the data in the high-concentration section is insufficient.

Keywords