Ain Shams Engineering Journal (Jul 2024)

Comparative assessment of rainfall-based water level prediction using machine learning (ML) techniques

  • Azazkhan Ibrahimkhan Pathan,
  • Lariyah Bte Mohd Sidek,
  • Hidayah Bte Basri,
  • Muhammad Yusuf Hassan,
  • Muhammad Izzat Azhar Bin Khebir,
  • Siti Mariam Binti Allias Omar,
  • Mohd Hazri bin Moh Khambali,
  • Adrián Morales Torres,
  • Ali Najah Ahmed

Journal volume & issue
Vol. 15, no. 7
p. 102854

Abstract

Read online

Machine learning (ML) techniques are rapidly emerging as effective tools in predicting complex hydrological processes. The present study aims to comparatively assess the efficacy of four machine learning algorithms – Multi-Layer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), Support Vector Regression (SVR), and Random Forest (RF) – in predicting water levels using rainfall data at the Batu Dam, Malaysia. Situated about 16 km from Kuala Lumpur city center, the Batu Dam plays a crucial role in flood mitigation and water supply. Utilizing a statistical approach, the models were evaluated based on key performance metrics: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R2). Preliminary results accentuated the superior predictive prowess of the MLP model, especially for challenging forecasting scenarios with longer lag intervals. This investigation not only accentuates the potential of data-driven methodologies in hydrology but also offers valuable insights for water resource management in the region. When all scenarios for the MLP model are considered, it is observed that the 3-day scenario performed the best within MLP, with the lowest RMSE (at 0.0072) and MAE (at 0.005), and the highest R2 score (at 0.9972). Furthermore, within the MLP model. Due to its exceptionally high performance, the MLP-3 model proved to be an excellent choice for our modeling purposes. Furthermore, it was observed that MLP-3 yields a high R2 score of 0.994, and its predictions aligned closely with the actual water level values. This indicates that the model fits very well to the modeling problem. On the other hand, the SVR-30 model had an R2 score of 0.83, and its predictions were quite scattered with respect to the actual water levels. Four different input scenarios were investigated, considering correlation analysis. Generally, the comparison of four ML model indicated that the MLP model offered better accuracy in predicting daily water levels with respect to different assessment criteria. The findings of this study depicted the accomplishment of MLP model in capturing the changes in the water level of a dam thus paving the way for which the model can be used in works to mitigate potential risk that may occur in the future from natural events.

Keywords