Ecological Informatics (Sep 2024)

Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models

  • I. Busari,
  • D. Sahoo,
  • K.P. Sudheer,
  • R.D. Harmel,
  • C. Privette,
  • M. Schlautman,
  • C. Sawyer

Journal volume & issue
Vol. 82
p. 102735

Abstract

Read online

Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of ML predictions is influenced by both model uncertainties due to inherent model structures and errors associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift and laboratory analysis and sample handling errors. While impacts of model uncertainty are commonly addressed using probabilistic approaches, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs in a mesotrophic lake. Using randomized subsets of input measured datasets that mimic possible chlorophyll-a concentration distributions, the study built 1000 Random Forest (RF) and Support Vector Regression (SVR) models. An independent measured dataset was used to validate the ensemble models, allowing for model performance evaluation and the creation of prediction intervals to measure the propagated uncertainty. Our findings showed that the model predictions have MAE that ranged between 0.16 μg/l and 5.19 μg/l, and RMSE ranging between 0.20 μg/l and 7.39 μg/l. The highest uncertainty coverage of 0.71 was observed in the RF model without chlorophyll-a sensor values as a predictor. The study found that the training dataset sizes due to the high frequency and manually sampled nature influence how much measurement uncertainty is covered. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management.

Keywords