Machine Learning Analysis for Data Incompleteness (MADI): Analyzing the Data Completeness of Patient Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

Varadraj P. Gurupur; Muhammed Shelleh

doi:10.1109/ACCESS.2021.3095240

IEEE Access (Jan 2021)

Machine Learning Analysis for Data Incompleteness (MADI): Analyzing the Data Completeness of Patient Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

Varadraj P. Gurupur,
Muhammed Shelleh

Affiliations

Varadraj P. Gurupur: ORCiD; Department of Health Management and Informatics, University of Central Florida, Orlando, FL, USA
Muhammed Shelleh: ORCiD; Department of Computer Science, University of Central Florida, Orlando, FL, USA

DOI: https://doi.org/10.1109/ACCESS.2021.3095240
Journal volume & issue: Vol. 9
pp. 95994 – 96001

Abstract

Read online

The purpose of this article is to propose a methodology involving various methods that can be used to predict the data incompleteness of a dataset. Here the investigators have presented data incompleteness as both continuous and discrete random variables. In addition the investigators used transfer entropy for the purpose of advancing the science associated with the analysis of data incompleteness of electronic health records. The underlying methodology has been coined as “Machine Learning Analysis for Data Incompleteness” (MADI) with the intention of developing a possible solution to data incompleteness in electronic health records. MADI advances the analysis of data incompleteness with the use of Kolomogorov Smirnov goodness of fit, mielke distribution, and beta distributions for a holistic analysis. Alongside the methodology presented, the investigators explored stochastic gradient descent, generalized additive models, and support vector machines for comparison. Overall, the investigators have presented a complete set of methods and algorithms to help predict data incompleteness in a medical setting and provided suggestions for practical applications into the prediction of data incompleteness.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords