Missing Value Imputation Methods for Electronic Health Records

Konstantinos Psychogyios; Loukas Ilias; Christos Ntanos; Dimitris Askounis

doi:10.1109/ACCESS.2023.3251919

IEEE Access (Jan 2023)

Missing Value Imputation Methods for Electronic Health Records

Konstantinos Psychogyios,
Loukas Ilias,
Christos Ntanos,
Dimitris Askounis

Affiliations

Konstantinos Psychogyios: ORCiD; Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Loukas Ilias: ORCiD; Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Christos Ntanos: ORCiD; Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece
Dimitris Askounis: Decision Support Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens, Greece

DOI: https://doi.org/10.1109/ACCESS.2023.3251919
Journal volume & issue: Vol. 11
pp. 21562 – 21574

Abstract

Read online

Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every $N$ epochs using a different value for the variable $k$ each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords