Aerosol and Air Quality Research (Nov 2024)

Leveraging Satellite Data for Predicting PM10 Concentration with Machine Learning Models: A Study in the Plains of North Bengal, India

  • Ayan Das,
  • Manoranjan Sahu

DOI
https://doi.org/10.4209/aaqr.240066
Journal volume & issue
Vol. 24, no. 12
pp. 1 – 17

Abstract

Read online

Abstract The current air quality monitoring network is sparse and economically impractical in remote areas. Remote sensing offers an effective solution, providing real-time observations with high spatial and temporal resolution. This study aimed to estimate PM10 concentrations in Siliguri City, West Bengal, from 2019 to 2022, using Aerosol Optical Depth (AOD) at a 10 × 10 km spatial resolution. During the study period, the average PM10 level was 141.89 µg m−3, surpassing India’s National Ambient Air Quality Standards (NAAQS). Five different machine learning regression models, namely linear regression (LR), Support Vector Regression (SVR), Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB), were employed and evaluated using Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) along with R2 for predicting the daily ground-level PM10 concentration using AOD, land cover data, and meteorological parameters. Through statistical testing, it was determined that dew point temperature, precipitation, and Normalized Difference Vegetation Index (NDVI) were statistically significant (p < 0.05) with AOD. Tree-based regression models, particularly RF, outperformed other models, achieving an R2 value of 0.83 and RMSE of 25.51 µg m−3, with an average Mean Absolute Percentage Error (MAPE) of 15.43% in predicting the test dataset. This model also showed NDVI being the most important parameter in the analysis. To assess model transferability, all five models were utilized to predict PM10 concentrations in the Jalpaiguri region, referencing National Air Quality Monitoring Programme (NAMP) data. The average MAPE for the readings using LR, SVR, RF, GB, and XGB models was found to be 25.37%, 30.66%, 12.86%, 18.27%, and 24.93% respectively at the validation site. The RF model excelled in managing complex non-linear interactions typical of regional environments. This study advances research in India’s data-scarce, challenging regions by using satellite data for more accurate air quality predictions.

Keywords