AIMS Environmental Science (Jul 2024)

Analysis of Data Splitting on Streamflow Prediction using Random Forest

  • Diksha Puri ,
  • Parveen Sihag ,
  • Mohindra Singh Thakur,
  • Mohammed Jameel ,
  • Aaron Anil Chadee,
  • Mohammad Azamathulla Hazi

DOI
https://doi.org/10.3934/environsci.2024029
Journal volume & issue
Vol. 11, no. 4
pp. 593 – 609

Abstract

Read online

This study is focused on the use of random forest (RF) to forecast the streamflow in the Kesinga River basin. A total of 169 data points were gathered monthly for the years 1991–2004 to create a model for streamflow prediction. The dataset was allotted into training and testing stages using various ratios, such as 50/50, 60/40, 70/30, and 80/20. The produced models were evaluated using three statistical indices: the root mean square error (RMSE), the mean absolute error (MAE), and the correlation coefficient (CC). The analysis of the models' performances revealed that the training and testing ratios had a substantial impact on the RF model's predictive abilities; models performed best when the ratio was 60/40. The findings demonstrated the right dataset ratios for precise streamflow prediction, which will be beneficial for hydraulic engineers during the water-related design and engineering stages of water projects.

Keywords