IEEE Access (Jan 2021)

Machine Learning Analysis for Data Incompleteness (MADI): Analyzing the Data Completeness of Patient Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

  • Varadraj P. Gurupur,
  • Muhammed Shelleh

DOI
https://doi.org/10.1109/ACCESS.2021.3095240
Journal volume & issue
Vol. 9
pp. 95994 – 96001

Abstract

Read online

The purpose of this article is to propose a methodology involving various methods that can be used to predict the data incompleteness of a dataset. Here the investigators have presented data incompleteness as both continuous and discrete random variables. In addition the investigators used transfer entropy for the purpose of advancing the science associated with the analysis of data incompleteness of electronic health records. The underlying methodology has been coined as “Machine Learning Analysis for Data Incompleteness” (MADI) with the intention of developing a possible solution to data incompleteness in electronic health records. MADI advances the analysis of data incompleteness with the use of Kolomogorov Smirnov goodness of fit, mielke distribution, and beta distributions for a holistic analysis. Alongside the methodology presented, the investigators explored stochastic gradient descent, generalized additive models, and support vector machines for comparison. Overall, the investigators have presented a complete set of methods and algorithms to help predict data incompleteness in a medical setting and provided suggestions for practical applications into the prediction of data incompleteness.

Keywords