Ain Shams Engineering Journal (Jun 2024)

A comparative analysis of missing data imputation techniques on sedimentation data

  • Wing Son Loh,
  • Lloyd Ling,
  • Ren Jie Chin,
  • Sai Hin Lai,
  • Kar Kuan Loo,
  • Choon Sen Seah

Journal volume & issue
Vol. 15, no. 6
p. 102717

Abstract

Read online

Sediment data pertains to various hydrological variables with complex sediment hydrodynamics such as sedimentation rates which are often incompletely presented. Thus, the availability of sedimentation data is of utmost necessity for data accessibility. A comparative analysis on the missing fine sediment data imputation performance was made based on four different techniques, namely the k-Nearest Neighbourhood (k-NN), Support Vector Regression (SVR), Multiple Regression (MR), and Artificial Neural Network (ANN), under the single imputation (SI) and multiple imputation (MI) regimes. Across different missing data proportions (10%-50%), the ANN demonstrated optimal results with consistent performance metrics recorded over both SI and MI regimes. For the highest missing data proportion (50%), the ANN presented the best imputation performance with a reported root mean squared error (RMSE) 0.000882, mean absolute error (MAE) 0.000595, coefficient of determination (R2) 71%, and Kling-Gupta Efficiency (KGE) 72%. The imputation performance ranking is as follows: ANN, SVR, MR, and k-NN.

Keywords