Journal of King Saud University: Computer and Information Sciences (Jan 2023)

Missing values imputation using Fuzzy K-Top Matching Value

  • Azza Ali,
  • Mervat Abu-Elkheir,
  • Ahmed Atwan,
  • Mohammed Elmogy

Journal volume & issue
Vol. 35, no. 1
pp. 426 – 437

Abstract

Read online

Missing data occurs when variables or observations are missing. Researchers exclude or impute influenced variables and data. This study proposes Fuzzy K-Top Matching Value (FKTM) for missing value imputation. It imputes missing numerical and categorical data with intelligent estimates based on similar records, decreasing bias. Expectation-maximization is used, where it employs fuzzy clustering to find a group of similar data and estimate them. We compare FKTM with original datasets on Immunotherapy and Cryotherapy. Multiple classification techniques are used on the imputed datasets. Random Forest achieved the best, with 93.3% for cryotherapy and 85.6% for Immunotherapy. The proposed approach is compared with Multivariate Imputation by Chained Equations (MICE) utilizing a Support Vector Machine. The proposed approach beats MICE with 82.2% accuracy. On the Cryotherapy dataset, the proposed approach surpasses existing strategies with 86.6% accuracy. Levene and Shapiro-Wilk were used to examine the homoscedasticity and normality of data after imputation. The proposed imputation procedure has no detrimental influence on the dataset. Finally, execution time and RMSE of imputed values are determined for three datasets with varied sample sizes and data dimensions. The proposed system exhibits a fast execution time and low RMSE. The proposed FKTM works well in experiments and looks promising.

Keywords