Geoscientific Model Development (Sep 2020)

ML-SWAN-v1: a hybrid machine learning framework for the concentration prediction and discovery of transport pathways of surface water nutrients

  • B. Wang,
  • B. Wang,
  • M. R. Hipsey,
  • M. R. Hipsey,
  • C. Oldham,
  • C. Oldham

DOI
https://doi.org/10.5194/gmd-13-4253-2020
Journal volume & issue
Vol. 13
pp. 4253 – 4270

Abstract

Read online

Nutrient data from catchments discharging to receiving waters are monitored for catchment management. However, nutrient data are often sparse in time and space and have non-linear responses to environmental factors, making it difficult to systematically analyse long- and short-term trends and undertake nutrient budgets. To address these challenges, we developed a hybrid machine learning (ML) framework that first separated baseflow and quickflow from total flow, generated data for missing nutrient species, and then utilised the pre-generated nutrient data as additional variables in a final simulation of tributary water quality. Hybrid random forest (RF) and gradient boosting machine (GBM) models were employed and their performance compared with a linear model, a multivariate weighted regression model, and stand-alone RF and GBM models that did not pre-generate nutrient data. The six models were used to predict six different nutrients discharged from two study sites in Western Australia: Ellen Brook (small and ephemeral) and the Murray River (large and perennial). Our results showed that the hybrid RF and GBM models had significantly higher accuracy and lower prediction uncertainty for almost all nutrient species across the two sites. The pre-generated nutrient and hydrological data were highlighted as the most important components of the hybrid model. The model results also indicated different hydrological transport pathways for total nitrogen (TN) export from two tributary catchments. We demonstrated that the hybrid model provides a flexible method to combine data of varied resolution and quality and is accurate for the prediction of responses of surface water nutrient concentrations to hydrologic variability.