Water (Mar 2023)
Groundwater Contamination Site Identification Based on Machine Learning: A Case Study of Gas Stations in China
Abstract
Accurately identifying groundwater contamination sites is vital for groundwater protection and restoration. This study aims to use a machine learning (ML) approach to identify groundwater contamination sites with total petroleum hydrocarbons (TPH) as target contaminants in a case study of gas stations in China. Firstly, six classical ML algorithms, including logistic regression, decision tree, gradient boosting decision tree (GBDT), random forest, multi-layer perceptron, and support vector machine, were applied to develop the identification models of TPH-contaminated groundwater with 40 features and the performances were compared. The comparison results showed that the GBDT model achieves the best prediction performance, with F1 score of 1 and AUC value of 1. Next, Bayesian optimization optimized GBDT (BO-GBDT) was conducted to further decrease the training time from 19,125 s to 513 s while maintaining the same prediction performance (F1 score = 1, AUC = 1). Finally, Shapley additive explanations (SHAP) analysis was performed on the BO-GBDT model. The SHAP results displayed that the critical feature variables in the BO-GBDT model include wind, population, evaporation, total potassium in the soil, precipitation, and leakage accident. This study demonstrated that BO-GBDT is one satisfactory model to identify groundwater TPH-contamination at gas stations. The method proposed in this study has the potential to be applied to other types of groundwater contamination sites.
Keywords