IEEE Access (Jan 2024)

Two-Stage Feature Engineering to Predict Air Pollutants in Urban Areas

  • Fareena Naz,
  • Muhammad Fahim,
  • Adnan Ahmad Cheema,
  • Nguyen Trung Viet,
  • Tuan-Vu Cao,
  • Ruth Hunter,
  • Trung Q. Duong

DOI
https://doi.org/10.1109/ACCESS.2024.3443810
Journal volume & issue
Vol. 12
pp. 114073 – 114085

Abstract

Read online

Air pollution is a global challenge to human health and the ecological environment. Identifying the relationship among pollutants, their fundamental sources and detrimental effects on health and mental well-being is critical in order to implement appropriate countermeasures. The way forward to address this issue and assess air quality is through accurate air pollution prediction. Such prediction can subsequently assist governing bodies in making prompt, evidence-based decisions and prevent further harm to our urban environment, public health, and climate, all of which co-benefit our economy. In this study, the main objective is to explore the strength of features and proposed a two stage feature engineering approach, which fuses the advantage of influential factors along with the decomposition approach and generates an optimum feature combination for five major pollutants including Nitrogen Dioxide (NO2), Ozone (O3), Sulphur Dioxide (SO2), and Particulate Matter (PM2.5, and PM10). The experiments are conducted using a dataset from 2015 to 2020 which is publicly available and is collected from Belfast-based air quality monitoring stations in Northern Ireland, UK. In stage-1, using the dataset new features such as trigonometric and statistical features are created to capture their dependency on the target pollutant and generated correlation-inspired best feature combinations to improve forecasting model performance. This is further enhanced in stage-2 by an optimum feature combination which is an integration of stage-1 and Variational Mode Decomposition (VMD) based features. This study employed a simplified Long Short Term Memory (LSTM) neural network and proposed a single-step forecasting model to predict multivariate time series data. Three performance indicators are used to evaluate the effectiveness of forecasting model: 1) root mean square error (RMSE), 2) mean absolute error (MAE), and 3) R-squared (R2). The results demonstrate the effectiveness of proposed approach with 13% improvement in performance (in terms of R2) and the lowest error scores for both RMSE and MAE.

Keywords