IEEE Access (Jan 2023)

Imputation of Missing Clinical Covariates for Downstream Classification Problems

  • Benjamin Agbo,
  • Hussain Al-Aqrabi,
  • Tariq Alsboui,
  • Muhammad Hussain,
  • Richard Hill

DOI
https://doi.org/10.1109/ACCESS.2023.3317775
Journal volume & issue
Vol. 11
pp. 102935 – 102943

Abstract

Read online

Noticeable growth in the use of intelligent devices has resulted in the generation of vast amounts of data from sensor devices. When dealing with large amounts of data, it is common to observe databases with large amounts of missing values. This is a challenge for data miners because various methods for data analysis only work well on complete databases. A traditional approach to handling missing data is to discard instances of missing values and only use complete cases for analysis. However, research has shown that this approach is not practical especially when large amounts of data are missing. This led to an increased need to develop strategies for replacing missing values with plausible values through imputation. This study presents an imputation strategy called $med.BFMVI$ for recovering missing values before training downstream classification models. Experiments simulated missingness from 10% to 40% using MCAR and MAR mechanisms and the performance of the proposed technique was measured against state-of-the-art techniques. Overall, the proposed algorithm recorded the best imputation accuracy as opposed to benchmark techniques and showed significant improvements on downstream learning.

Keywords