Environmental Challenges (Dec 2024)

Robust prediction of chlorophyll-A from nitrogen and phosphorus content in Philippine and global lakes using fine-tuned, explainable machine learning

  • Karl Ezra Pilario,
  • Eric Jan Escober,
  • Aurelio de los Reyes V,
  • Maria Pythias Espino

Journal volume & issue
Vol. 17
p. 101056

Abstract

Read online

Chlorophyll-a (Chl-a) content in waterbodies is a primary indicator of algal biomass and is used to detect impending harmful algal blooms. This paper presents a methodology using 8 popular machine learning (ML) models for estimating Chl-a concentration from nutrient content in lakes. Different from previous works, we introduce 3 novel steps: (i) the use of Bayesian optimization for fine-tuning ML hyper-parameters to improve performance; (ii) the use of explainability methods to understand the most influential inputs to Chl-a prediction; and (iii) the use of robustness analysis to assess how models are affected by measurement noise. Two case studies were used to test our approach: Laguna Lake, Philippines, and various lakes from Japan, the United States of America, Canada, and Uganda. We found that fine-tuned Kernel Ridge Regression and Gaussian Process Regression are consistently the most accurate (>80%) and robust models in both case studies. In Laguna Lake, Shapley explanations revealed that phosphate and nitrate ions are the most important predictors of Chl-a, while total phosphorus is that for global lakes. Hence, these parameters are suggested to be monitored more closely for detecting algal blooms. By making our codes accessible, we hope that our methods can serve as a benchmark for the data-driven modeling of Chl-a content in lakes, and aid in their management through model deployment.

Keywords