E3S Web of Conferences (Jan 2024)
Improving Machine Learning Based PM2.5 Prediction by Segregating Biomass Emission Factor from Chemical Transport Model
Abstract
Located in the heart of Mainland Southeast Asia, Thailand is associated with high biomass burning (BB) activities from local and neighbouring countries. The seasonal pattern of BB manifests itself as a potential predictor for PM2.5 concentration. Consequently, we enhanced machine learning based PM2.5 prediction by segregating BB factor from the Community Multiscale Air Quality (CMAQ). Two Light Gradient Boosting Machine (LightGBM) models with different CMAQ predictors were developed: the BB-integrated model, which incorporated CMAQ-simulated PM2.5 from all emission sources and the BB-segregated model, which incorporated CMAQ-simulated PM2.5 from sources other than BB (CMAQ_PM25_Othr) and CMAQ-simulated PM2.5 from BB emissions (CMAQ_PM25_BB). The two models had shared control predictors, which included simulated meteorological variables from WRF model, population, elevation, and land-use variables, and they were evaluated using a crossvalidation (CV). The BB-segregated model outperformed the BB-integrated model, achieving overall-CV R2 values of 0.86 and 0.82, respectively. The analysis of feature importance for the BB-segregated model indicates that CMAQ_PM25_Othr and CMAQ_PM25_BB are the two most significant predictors. These findings emphasize the importance of considering BB emissions when predicting PM2.5 concentrations, particularly in regions with high BB activities.