IEEE Access (Jan 2022)
Long Short Term Memory Water Quality Predictive Model Discrepancy Mitigation Through Genetic Algorithm Optimisation and Ensemble Modeling
Abstract
A predictive long short-term memory (LSTM) model developed on a particular water quality dataset will only apply to the dataset and may fail to make an accurate prediction on another dataset. This paper focuses on improving LSTM model tolerance by mitigating discrepancies in model prediction capability that arises when a model is applied to different datasets. Two predictive LSTM models are developed from the water quality datasets, Baffle and Burnett, and are optimised using the metaheuristic genetic algorithm (GA) to create hybrid GA-optimised LSTM models that are subsequently combined with a linear weight-based technique to develop a tolerant predictive ensemble model. The models successfully predict river water quality in terms of dissolved oxygen concentration. After GA-optimisation, the RMSE values of the Baffle and Burnett models decrease by 42.42% and 10.71%, respectively. Furthermore, two ensemble models are developed from the GA-hybrid models, namely the average ensemble and the optimal weighted ensemble. The GA-Baffle RMSE values decrease by 5.05% for the average ensemble and 6.06% for the weighted ensemble, and the GA-Burnett RMSE values decrease by 7.84% and 8.82%, respectively. When tested on unseen and unrelated datasets, the models make accurate predictions, indicating the applicability of the models in domains outside the water sector. The consistent and similar performance of the models on any dataset illustrates the successful mitigation of discrepancies in the predictive capacity of individual LSTM models by the proposed ensemble scheme. The observed model performance highlights the datasets on which the models could potentially make accurate predictions.
Keywords