Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Marko Miletic; Murat Sariyar

doi:10.3390/app14145975

Applied Sciences (Jul 2024)

Challenges of Using Synthetic Data Generation Methods for Tabular Microdata

Marko Miletic,
Murat Sariyar

Affiliations

Marko Miletic: School of Engineering and Computer Science, Bern University of Applied Sciences, Quellgasse 21, 2502 Bienne, Switzerland
Murat Sariyar: School of Engineering and Computer Science, Bern University of Applied Sciences, Quellgasse 21, 2502 Bienne, Switzerland

DOI: https://doi.org/10.3390/app14145975
Journal volume & issue: Vol. 14, no. 14
p. 5975

Abstract

Read online

The generation of synthetic data holds significant promise for augmenting limited datasets while avoiding privacy issues, facilitating research, and enhancing machine learning models’ robustness. Generative Adversarial Networks (GANs) stand out as promising tools, employing two neural networks—generator and discriminator—to produce synthetic data that mirrors real data distributions. This study evaluates GAN variants (CTGAN, CopulaGAN), a variational autoencoder, and copulas on diverse real datasets of different complexity encompassing numerical and categorical attributes. The results highlight CTGAN’s sensitivity to training parameters and TVAE’s robustness across datasets. Scalability challenges persist, with GANs demanding substantial computational resources. TVAE stands out for its high utility across all datasets, even for high-dimensional data, though it incurs higher privacy risks, which is indicative of the curse of dimensionality. While no single model universally excels, understanding the trade-offs and leveraging model strengths can significantly enhance synthetic data generation (SDG). Future research should focus on adaptive learning mechanisms, scalability enhancements, and standardized evaluation metrics to advance SDG methods effectively. Addressing these challenges will foster broader adoption and application of synthetic data.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords