Ecological Indicators (Sep 2024)

Quantifying seasonal variations in pollution sources with machine learning-enhanced positive matrix factorization

  • Yaotao Xu,
  • Peng Li,
  • Minghui Zhang,
  • Lie Xiao,
  • Bo Wang,
  • Xiaoming Zhang,
  • Yunqi Wang,
  • Peng Shi

Journal volume & issue
Vol. 166
p. 112543

Abstract

Read online

As the pace of industrialization and urbanization accelerates, water quality management faces increasing challenges, with traditional methods for pollutant source apportionment often proving inadequate in handling complex environmental data. This study enhances the precision and reliability of pollutant source identification by integrating Positive Matrix Factorization (PMF) models with diverse machine learning techniques. Utilizing data from 17 water quality monitoring stations along the Wuding River from 2017 to 2021, we employed Random Forest (RF), Support Vector Machine (SVM), Elastic Net (EN), and Extreme Gradient Boosting (XGBoost) models to predict the Water Quality Index (WQI) during dry and wet seasons. Results indicate that the RF model exhibited optimal performance in the dry season (R2 = 0.93), while the SVM was superior in the wet season (R2 = 0.94). SHAP (SHapley Additive exPlanations) value analysis identified CODMn and NH3-N as significant influencers on WQI in the dry season, whereas COD, BOD, and TP gained prominence during the wet season. SHAP values reveal the contribution of each feature to the model output, thereby enhancing the model’s transparency and interpretability. Additionally, feature importance identified by machine learning was utilized as weights to optimize the contribution rates predicted by the PMF model. The optimised model was able to identify the contribution of domestic and farm effluent discharges more accurately in the dry season, with a significant increase in the percentage of identification from 19.4 % to 45.4 %, and an increase in the percentage of contribution from agricultural non-point sources and domestic effluent in the rainy season. This research offers a novel perspective on the characteristics of river water pollution and holds significant implications for formulating data-driven environmental management strategies.

Keywords