Water Science and Technology (Jun 2024)
Towards good modelling practice for parallel hybrid models for wastewater treatment processes
Abstract
This study explores various approaches to formulating a parallel hybrid model (HM) for Water and Resource Recovery Facilities (WRRFs) merging a mechanistic and a data-driven model. In the study, the HM is constructed by training a neural network (NN) on the residual of the mechanistic model for effluent nitrate. In an initial experiment using the Benchmark Simulation Model no. 1, a parallel HM effectively addressed limitations in the mechanistic model's representation of autotrophic bacteria growth and the data-driven model's incapability to extrapolate. Next, different versions of a parallel HM of a large pilot-scale WRRF are constructed, using different calibration/training datasets and different versions of the mechanistic model to investigate the balance between the calibration effort for the mechanistic model and the compensation by the NN component. The HM can improve predictions compared to the mechanistic model. Training the NN on an independent validation dataset produced better results than on the calibration dataset. Interestingly, the best performance is achieved for the HM based on a mechanistic model using default (uncalibrated) parameters. Both long short-term memory (LSTM) and convolutional neural network (CNN) are tested as data-driven components, with a CNN HM (root-mean-squared error (RMSE) = 1.58 mg NO3-N/L) outperforming an LSTM HM (RMSE = 4.17 mg NO3-N/L). HIGHLIGHTS In a parallel hybrid model (HM), a data-driven component compensates for structural gaps in a mechanistic model.; A data-driven component of an HM should be trained on an independent dataset.; Integrating an uncalibrated mechanistic model in a parallel HM may lead to better results than applying an overly calibrated mechanistic model.; A convolutional neural network outperforms a long short-term neural network as data-driven component of an HM.;
Keywords