PLOS Digital Health (Oct 2022)
High resolution data modifies intensive care unit dialysis outcome predictions as compared with low resolution administrative data set.
Abstract
High resolution clinical databases from electronic health records are increasingly being used in the field of health data science. Compared to traditional administrative databases and disease registries, these newer highly granular clinical datasets offer several advantages, including availability of detailed clinical information for machine learning and the ability to adjust for potential confounders in statistical models. The purpose of this study is to compare the analysis of the same clinical research question using an administrative database and an electronic health record database. The Nationwide Inpatient Sample (NIS) was used for the low-resolution model, and the eICU Collaborative Research Database (eICU) was used for the high-resolution model. A parallel cohort of patients admitted to the intensive care unit (ICU) with sepsis and requiring mechanical ventilation was extracted from each database. The primary outcome was mortality and the exposure of interest was the use of dialysis. In the low resolution model, after controlling for the covariates that are available, dialysis use was associated with an increased mortality (eICU: OR 2.07, 95% CI 1.75-2.44, p<0.01; NIS: OR 1.40, 95% CI 1.36-1.45, p<0.01). In the high-resolution model, after the addition of the clinical covariates, the harmful effect of dialysis on mortality was no longer significant (OR 1.04, 95% 0.85-1.28, p = 0.64). The results of this experiment show that the addition of high resolution clinical variables to statistical models significantly improves the ability to control for important confounders that are not available in administrative datasets. This suggests that the results from prior studies using low resolution data may be inaccurate and may need to be repeated using detailed clinical data.