Scientific Data (Feb 2024)

A city-level dataset of heavy metal emissions into the atmosphere across China from 2015–2020

  • Qi Dong,
  • Yue Li,
  • Xinhua Wei,
  • Le Jiao,
  • Lina Wu,
  • Zexin Dong,
  • Yi An

DOI
https://doi.org/10.1038/s41597-024-03089-3
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The absence of nationwide distribution data regarding heavy metal emissions into the atmosphere poses a significant constraint in environmental research and public health assessment. In response to the critical data deficiency, we have established a dataset covering Cr, Cd, As, and Pb emissions into the atmosphere (HMEAs, unit: ton) across 367 municipalities in China. Initially, we collected HMEAs data and covariates such as industrial emissions, vehicle emissions, meteorological variables, among other ten indicators. Following this, nine machine learning models, including Linear Regression (LR), Ridge, Bayesian Ridge (Bayesian), K-Neighbors Regressor (KNN), MLP Regressor (MLP), Random Forest Regressor (RF), LGBM Regressor (LGBM), Lasso, and ElasticNet, were assessed using coefficient of determination (R2), root-mean-square error (RMSE) and Mean Absolute Error (MAE) on the testing dataset. RF and LGBM models were chosen, due to their favorable predictive performance (R2: 0.58–0.84, lower RMSE/MAE), confirming their robustness in modelling. This dataset serves as a valuable resource for informing environmental policies, monitoring air quality, conducting environmental assessments, and facilitating academic research.