Applied Sciences (Oct 2022)
Analyzing the Data Completeness of Patients’ Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records
Abstract
The purpose of this article is to illustrate an investigation of methods that can be effectively used to predict the data incompleteness of a dataset. Here, the investigators have conceptualized data incompleteness as a random variable, with the overall goal behind experimentation providing a 360-degree view of this concept conceptualizing incompleteness of a dataset both as a continuous, discrete random variable depending on the aspect of the required analysis. During the course of the experiments, the investigators have identified Kolomogorov–Smirnov goodness of fit, Mielke distribution, and beta distributions as key methods to analyze the incompleteness of a dataset for the datasets used for experimentation. A comparison of these methods with a mixture density network was also performed. Overall, the investigators have provided key insights into the use of methods and algorithms that can be used to predict data incompleteness and have provided a pathway for further explorations and prediction of data incompleteness.
Keywords