Acta Electrotechnica et Informatica (Dec 2022)

Principles of Synthesizing Medical Datasets

  • Kolárik Michal,
  • Gojdičová Lucia,
  • Paralič Ján

DOI
https://doi.org/10.2478/aei-2022-0019
Journal volume & issue
Vol. 22, no. 4
pp. 25 – 29

Abstract

Read online

Data in many application domains provide a valuable source for analysis and data-driven decision support. On the other hand, legislative restrictions are provided, especially on personal data and patients’ data in the medical domain. In order to maximize the use of data for decision purposes and comply with legislation, sensitive data needs to be properly anonymized or synthetized. This article contributes to the area of medical records synthesis. We first introduce this topic and present it in a broader context, as well as in terms of methods used and metrics for their evaluation. Based on the related work analysis, we selected CTGAN neural network model for data synthesis and experimentally validated it on three different medical datasets. The results were evaluated both quantitatively by means of selected metrics as well as qualitatively by means of proper visualization techniques. The results showed that in most cases, the synthesized dataset is a very good approximation of the original one, with similar prediction performance.

Keywords