IEEE Access (Jan 2023)

Generative Adversarial Networks Assist Missing Data Imputation: A Comprehensive Survey and Evaluation

  • Reza Shahbazian,
  • Sergio Greco

DOI
https://doi.org/10.1109/ACCESS.2023.3306721
Journal volume & issue
Vol. 11
pp. 88908 – 88928

Abstract

Read online

Missing data imputation is a technique to deal with incomplete datasets. Since many models and algorithms cannot be applied to data containing missing values, a pre-processing step needs to be performed to remove incomplete data or to estimate the missing values. This is a well-known problem referred to as the data imputation problem. Several approaches have been designed for data imputation. These algorithms can be divided into two main categories: statistical and machine learning-based algorithms. As machine learning algorithms are optimized, they usually have better performance compared with statistical ones. In this paper, we review the most recent literature related to missing data imputation based on generative adversarial networks (GANs) that have gained tremendous attention in dealing with missing values. We examine the structures of GANs for missing data imputation and discuss the commonly used datasets and metrics for evaluation. We also cover the influence of the missing datatype, the effect of the missing data fraction, and the algorithm-related problems on data imputation performance. We conduct experiments on two publicly available datasets and evaluate the performance of GAIN, a missing data imputation algorithm to that of existing state-of-the-art approaches, demonstrating that the GAN-based algorithm outperforms the others in terms of RMSE and FID.

Keywords