Precision and Future Medicine (Sep 2024)
Computationally efficient and stable real-world synthetic emergency room electronic health record data generation: high similarity and privacy preserving diffusion model approach: A retrospective cohort study
Abstract
Purpose This study aimed to develop real-world synthetic electronic health record (EHR) for emergency departments using computationally efficient and stable diffusion probabilistic models. Methods In this study, we compared the performance of diffusion models and state-ofthe-art generative adversarial networks (GANs) in terms of statistical similarity, privacy, medical usefulness, and the feasibility of using synthetic data for machine learning purposes. Results Our results demonstrate that diffusion models are significantly more computationally efficient than GANs and perform comparably or slightly better in terms of similarity, privacy, and utility. We also found that the data quality of the diffusion model is statistically very similar for both categorical and continuous values and can address class imbalance precisely. Moreover, the usefulness of synthetic data is almost identical to that of real EHR data. Our privacy analysis showed that the synthetic data generated by the diffusion models were private. Conclusion These findings have significant implications for improving the efficiency of emergency settings and enabling real-time emergency room data modeling. This demonstrates the potential of diffusion models for generating computationally efficient high-quality synthetic data. The study concluded that diffusion models can generate real-world synthetic EHRs that are computationally efficient, private, and high-quality, and can be used for machine learning purposes in emergency settings.
Keywords