Quantifying the Impact of Multiple Factors on Air Quality Model Simulation Biases Using Machine Learning
Chunying Fan,
Ruilin Wang,
Ge Song,
Mengfan Teng,
Maolin Zhang,
Huangchuan Liu,
Zhujun Li,
Siwei Li,
Jia Xing
Affiliations
Chunying Fan
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Ruilin Wang
Institute of Software, Chinese Academy of Sciences, Beijing 100864, China
Ge Song
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Mengfan Teng
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Maolin Zhang
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Huangchuan Liu
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Zhujun Li
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Siwei Li
Hubei Key Laboratory of Quantitative Remote Sensing of Land and Atmosphere, School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
Jia Xing
Department of Civil and Environmental Engineering, The University of Tennessee, Knoxville, TN 37996, USA
Accurate air pollutant prediction is essential for addressing environmental and public health concerns. Air quality models like WRF-CMAQ provide simulations, but often show significant errors compared to observed concentrations. To identify the sources of these model biases, we applied the XGBoost machine learning algorithm to assess the performance of WRF-CMAQ in predicting air pollutants across two regions in China. XGBoost models trained with observations achieved high accuracy (R > 0.95), indicating that the selected features effectively capture pollutant variations. When trained on WRF-CMAQ inputs, XGBoost still improved performance but revealed biases linked to both model inputs (10–60%) and mechanisms (1–30%). Analysis identified previous-hour pollutant levels as the largest bias contributor, followed by meteorological variables. The study highlights the need for improving both model inputs and mechanisms to enhance future air quality predictions and support pollution control strategies.