Annals of GIS (Jul 2023)
Modelling PM2.5 for Data-Scarce Zone of Northwestern India using Multi Linear Regression and Random Forest Approaches
Abstract
ABSTRACTPM2.5 (Particulate matter with aerodynamic diameter <2.5 m) concentrations above permissible limit causes air quality deterioration and hampers human health. Due to the lack of a good spatial network of ground-based PM monitoring sites and systematic checking, the availability of continuous data of PM2.5 concentrations at macro and meso scales is restricted. Present research estimated PM2.5 concentrations at high (1 km) resolution over Faridabad, Ghaziabad, Gurugram and Gautam Buddha Nagar, a data-scarce zone of the highly urbanized area of northwestern India for the year 2019 using Random Forest (RF), Multi-Linear Regression (MLR) models and Hybrid Model combining RF and MLR. It included Aerosol Optical Depth (AOD), meteorological data and limited in-situ data of PM2.5. For validation, the correlation coefficient (R), Root-Mean-Square Error (RMSE), Mean Absolute Error (MAE) and Relative Prediction Error (RPE) have been utilized. The hybrid model estimated PM2.5 with a greater correlation (R = 0.865) and smaller RPE (22.41%) compared to standalone MLR/RF models. Despite the inadequate in-situ data, Greater Noida has been found to have a high correlation (R = 0.933) and low RPE (32.13%) in the hybrid model. The most polluted seasons of the year are winter (137.28 µgm−3) and post-monsoon (112.93 µgm−3), whereas the wet monsoon (44.56 µgm−3) season is the cleanest. The highest PM2.5 level was recorded in Noida followed by Ghaziabad, Greater Noida and Faridabad. The findings of the present research will provide an input dataset for air pollution exposure risk research in parts of northwestern India with sparse monitoring data.
Keywords