Atmospheric Chemistry and Physics (Nov 2021)

Estimation of the vertical distribution of particle matter (PM<sub>2.5</sub>) concentration and its transport flux from lidar measurements based on machine learning algorithms

  • Y. Ma,
  • Y. Zhu,
  • B. Liu,
  • H. Li,
  • S. Jin,
  • Y. Zhang,
  • R. Fan,
  • W. Gong

DOI
https://doi.org/10.5194/acp-21-17003-2021
Journal volume & issue
Vol. 21
pp. 17003 – 17016

Abstract

Read online

The vertical distribution of aerosol extinction coefficient (EC) measured by lidar systems has been used to retrieve the profile of particle matter with a diameter <2.5 µm (PM2.5). However, the traditional linear model (LM) cannot consider the influence of multiple meteorological variables sufficiently and then induce the low inversion accuracy. Generally, the machine learning (ML) algorithms can input multiple features which may provide us with a new way to solve this constraint. In this study, the surface aerosol EC and meteorological data from January 2014 to December 2017 were used to explore the conversion of aerosol EC to PM2.5 concentrations. Four ML algorithms were used to train the PM2.5 prediction models: random forest (RF), K-nearest neighbor (KNN), support vector machine (SVM) and extreme gradient boosting decision tree (XGB). The mean absolute error (root mean square error) of LM, RF, KNN, SVM and XGB models were 11.66 (15.68), 5.35 (7.96), 7.95 (11.54), 6.96 (11.18) and 5.62 (8.27) µg/m3, respectively. This result shows that the RF model is the most suitable model for PM2.5 inversions from EC and meteorological data. Moreover, the sensitivity analysis of model input parameters was also conducted. All these results further indicated that it is necessary to consider the effect of meteorological variables when using EC to retrieve PM2.5 concentrations. Finally, the diurnal and seasonal variations of transport flux (TF) and PM2.5 profiles were analyzed based on the lidar data. The large PM2.5 concentration occurred at approximately 13:00–17:00 local time (LT) in 0.2–0.8 km. The diurnal variations of the TF show a clear conveyor belt at approximately 12:00–18:00 LT in 0.5–0.8 km. The results indicated that air pollutant transport over Wuhan mainly occurs at approximately 12:00–18:00 LT in 0.5–0.8 km. The TF near the ground usually has the highest value in winter (0.26 mg/m2 s), followed by the autumn and summer (0.2 and 0.19 mg/m2 s, respectively), and the lowest value in spring (0.14 mg/m2 s). These findings give us important information on the atmospheric profile and provide us sufficient confidence to apply lidar in the study of air quality monitoring.