IEEE Access (Jan 2022)

An Improved Imputation Method for Accurate Prediction of Imputed Dataset Based Radon Time Series

  • Adil Aslam Mir,
  • Fatih Vehbi Celebi,
  • Muhammad Rafique,
  • Lal Hussain,
  • Ahmed S. Almasoud,
  • Masoud Alajmi,
  • Fahd N. Al-Wesabi,
  • Anwer Mustafa Hilal

DOI
https://doi.org/10.1109/ACCESS.2022.3151892
Journal volume & issue
Vol. 10
pp. 20590 – 20601

Abstract

Read online

This article primarily focuses on the performance evaluation of a new methodology, imputation by feature importance (IBFI), to serve its imputed dataset in further regression scenarios when dealing with soil radon gas concentration (SRGC) time-series data. The time-series data have been collected spanning over fourteen(14) months period, which included four seismic events, and have been used for experimentation. The imputation by feature importance (IBFI) has been experimented and obtained results are found more efficient in the imputation of missing patterns in investigated time series when compared to traditionally used imputation methods viz. mean, median, mode, predictive mean matching (PMM), and hot-deck imputation.The IBFI methodology has been used in a variety of settings, such as data missing not at random (MNAR), missing completely at random (MCAR), and missing at random (MAR), with missingness percentages ranging from 10% to 30%. In this study, the imputed datasets, 9 for each imputation method, have been used further to predict the attribute of interest (radon concentration (RN)) keeping others as independent attributes such as thoron, temperature, relative humidity, and pressure time series. Support vector machine (SVM) with linear kernel has been used as a learning algorithm and its performance was evaluated based on the fact that how efficient and unbiased values were imputed. Statistical performance evaluation measures viz. root mean squared log error (RMSLE), root mean square error (RMSE), mean squared error (MSE),and mean absolute percentage error (MAPE) have been calculated for the assessment of performance. The findings of our study show that the IBFI imputed dataset has provided a better-fitted model. The model generation and predictions upon IBFI imputed time series result in more accurate predictions when compared to mean, median, mode, PMM, and hot-deck imputed time series. Furthermore, PMM and median imputed time series also perform closer to the IBFI imputed time series.

Keywords