Journal of Hydroinformatics (Nov 2023)

Predicting cyanobacteria abundance with Bayesian zero-inflated models

  • Yirao Zhang,
  • Nicolas M. Peleato

DOI
https://doi.org/10.2166/hydro.2023.229
Journal volume & issue
Vol. 25, no. 6
pp. 2161 – 2176

Abstract

Read online

Cyanobacterial blooms are a persistent concern to water management and treatment, with blooms potentially causing the release of toxins and degrading water quality. However, previous models have not considered the zero inflation of cyanobacteria count data. Typically, a relatively large proportion of measured count data are zeros or non-detects of cyanobacteria, representing either no cyanobacteria was present or the cell number was too low to be detected. Commonly used Poisson and negative binomial models for count data underestimate the probability of zero data, making these models less reliable. This study proposes a Bayesian approach to fit the cyanobacteria abundance data with mixture models that handle zero-inflated data. Predictor variables considered included weather and water quality measures that can easily be obtained day-to-day. The optimal model (zero-inflated negative binomial) was used to predict cyanobacteria alert levels on a separate test set. The ability to predict narrow alert levels was limited, however, 76% accuracy was achieved in predicting cyanobacteria counts above or below 1,000 cells/mL. Parameter estimates were highly variable and demonstrated that complex and uncertain factors influence cyanobacteria count predictions. The modelling approach can be applied to a wide range of environmental problems where zero-inflated data is common. HIGHLIGHTS Bayesian mixture models were used to model zero-inflated cyanobacteria count data.; A Bayesian variable selection method was applied to select important variables.; A zero-inflated model achieved 76% accuracy in predicting binary alert levels.; Bayesian framework produced probabilistic categorization of alert levels.; The model is well suited for management of complex systems with high uncertainty.;

Keywords