Water Practice and Technology (Dec 2022)
Investigation of scarce input data augmentation for modelling nitrogenous compounds in South African rivers
Abstract
In this study, basic interpolation and machine learning data augmentation were applied to scarce data used in Water Quality Analysis Simulation Programme (WASP) and Continuous Stirred Tank Reactor (CSTR) that were applied to nitrogenous compound degradation modelling in a river reach. Model outputs were assessed for statistically significant differences. Furthermore, artificial data gaps were introduced into the input data to study the limitations of each augmentation method. The Python Data Analysis Library (Pandas) was used to perform the deterministic interpolation. In addition, the effect of missing data at local maxima was investigated. The results showed little statistical difference between deterministic interpolation methods for data augmentation but larger differences when the input data were infilled specifically at locations where extrema occurred. HIGHLIGHTS Basic interpolation methods did not produce statistically significant differences in augmented datasets.; Increasing the gaps yielded greater differences between augmented datasets.; ML methods on real and artificial gaps produced acceptable results.; No significant differences between the WASP and Basic Model on real and artificial input.; Difference between the WASP and Basic Model on real and artificial input.;
Keywords