Ecological Indicators (Sep 2024)
Machine learning and explainable AI for chlorophyll-a prediction in Namhan River Watershed, South Korea
Abstract
Algal blooms are a primary concern in freshwater quality management. Thus, prediction of algal concentrations is crucial. Chlorophyll-a (Chl-a) is an indicator of algal concentration. This study focuses on the downstream watershed of the Namhan River, which is a significant water source for the Korean metropolitan area. Using 25 input variables, we developed an eXtreme Gradient Boosting (XGB) model for predicting Chl-a concentrations in Yanpyeong. The developed XGB model exhibited impressive predictability (R2 = 0.9487, RMSE = 3.1661, RSR = 0.2781). To assess variations in model predictability based on input variables, tree-model-based Feature Importance (Tree-FI) and Shapley Additive exPlanation (SHAP)-based feature importance (SHAP-FI) were used. The study validates the utility of eXplainable Artificial Intelligence (XAI) through SHAP and Partial Dependency Plot (PDP) analyses, revealing the positive contributions of pH and turbidity in Yangpyeong, and Chl-a in Hongcheon, to Chl-a concentrations. Additionally, it identifies complex interactions between water quality variables affecting Chl-a concentrations, emphasizing the intricate relationship in algal bloom prediction and management. This research underscores the significance of integrating machine learning models and XAI techniques in addressing real-world environmental challenges, providing valuable tools for effective algal prevention and management strategies.