IEEE Access (Jan 2025)

Advances in Biomedical Missing Data Imputation: A Survey

  • Miriam Barrabes,
  • Maria Perera,
  • Victor Novelle Moriano,
  • Xavier Giro-I-Nieto,
  • Daniel Mas Montserrat,
  • Alexander G. Ioannidis

DOI
https://doi.org/10.1109/ACCESS.2024.3516506
Journal volume & issue
Vol. 13
pp. 16918 – 16932

Abstract

Read online

Ensuring data quality in biomedical sciences is crucial for reliable research outcomes, particularly as precision medicine continues to gain prominence. Missing values compromise data quality and can make it difficult to perform data-based studies. The origins of missing values in biomedical datasets are diverse, including experimental errors, equipment malfunctions, and variations in data collection protocols tailored to individual patient conditions. To address the complex nature of missing values and the unique characteristics of biomedical data, a diverse spectrum of computational imputation techniques has emerged. These methods range from traditional statistical analysis to more modern approaches such as discriminative machine learning models and deep generative networks. This survey paper provides a comprehensive overview of the extensive literature on missing data imputation techniques, with a specific focus on applications in genomics, single-cell RNA sequencing, health records, and medical imaging. We outline the fundamental principles underlying each imputation technique and present a detailed analysis of their advantages and disadvantages, categorized by missing data patterns. To aid practitioners in method selection, we offer practical recommendations based on critical factors such as dataset size, data type, and missingness rate. By synthesizing insights from existing literature, we provide a holistic perspective on the effectiveness of various imputation methods under different biomedical contexts, thereby facilitating informed decision-making for researchers and practitioners in applying imputation techniques to biomedical data processing.

Keywords