Water (Mar 2023)

Groundwater Contamination Site Identification Based on Machine Learning: A Case Study of Gas Stations in China

  • Yanpeng Huang,
  • Longzhen Ding,
  • Weijiang Liu,
  • Haobo Niu,
  • Mengxi Yang,
  • Guangfeng Lyu,
  • Sijie Lin,
  • Qing Hu

DOI
https://doi.org/10.3390/w15071326
Journal volume & issue
Vol. 15, no. 7
p. 1326

Abstract

Read online

Accurately identifying groundwater contamination sites is vital for groundwater protection and restoration. This study aims to use a machine learning (ML) approach to identify groundwater contamination sites with total petroleum hydrocarbons (TPH) as target contaminants in a case study of gas stations in China. Firstly, six classical ML algorithms, including logistic regression, decision tree, gradient boosting decision tree (GBDT), random forest, multi-layer perceptron, and support vector machine, were applied to develop the identification models of TPH-contaminated groundwater with 40 features and the performances were compared. The comparison results showed that the GBDT model achieves the best prediction performance, with F1 score of 1 and AUC value of 1. Next, Bayesian optimization optimized GBDT (BO-GBDT) was conducted to further decrease the training time from 19,125 s to 513 s while maintaining the same prediction performance (F1 score = 1, AUC = 1). Finally, Shapley additive explanations (SHAP) analysis was performed on the BO-GBDT model. The SHAP results displayed that the critical feature variables in the BO-GBDT model include wind, population, evaporation, total potassium in the soil, precipitation, and leakage accident. This study demonstrated that BO-GBDT is one satisfactory model to identify groundwater TPH-contamination at gas stations. The method proposed in this study has the potential to be applied to other types of groundwater contamination sites.

Keywords