Ecological Indicators (Jan 2024)

Prediction of phytoplankton biomass and identification of key influencing factors using interpretable machine learning models

  • Yi Xu,
  • Di Zhang,
  • Junqiang Lin,
  • Qidong Peng,
  • Xiaohui Lei,
  • Tiantian Jin,
  • Jia Wang,
  • Ruifang Yuan

Journal volume & issue
Vol. 158
p. 111320

Abstract

Read online

The water quality of the Middle Route of the South-to-North Water Diversion Project (MRP) of China is related to the health and safety of about 8500w people. In recent years, multiple occurrences of abnormal algal proliferation in water conveyance channels have posed potential risks to the water quality of diverted water. To clarify the growth situation of phytoplankton and evaluate the response relationships between water environmental parameters and phytoplankton in the MRP, in this study, a long-term monitoring dataset was collected from seven monitoring sites from May 2015 to December 2020 to statistically analyse the spatiotemporal characteristics of planktonic algal cell density (ACD). Based on this, four machine learning models, including multiple linear regression (MLR), support vector regression (SVR), random forest (RF), and extreme gradient boosting (XBGoost), were constructed to predict the ACD, and the SHapely Additive exPlanations (SHAP) method was used to interpret the best prediction model for identifying the response relationship between water environmental parameters and ACD. The results showed that (1) the seven monitoring sites could be divided into two significant groups (i.e., Group I and Group II) by hierarchical clustering analysis (HCA), among which the ACD of Group II was higher than that of Group I. (2) the performance of four prediction models using different evaluation metrics indicated that the RF model surpassed in prediction accuracy compared with the other three models in predicting ACD variations in Group I and Group II. (3) SHAP analysis revealed that the key factors affecting ACD variations in Group I were water temperature (WT), water depth (WD), permanganate index (PI), flow rate (Q) and dissolved oxygen (DO), and the ACD were inhibited when WT was below 23 °C, WD exceeded 6.5 m, and Q was between 100 and 250 m3/s. Additionally, WT, PI, and DO were the most important predictors of ACD in Group II, and ACD were inhibited when WT was below 24 °C, PI was lower than 2.4 mg/L and DO was higher than 9 mg/L. This research provides a theoretical basis and reference for water quality management and algal ecological control of water transfer projects with high spatial heterogeneity.

Keywords