Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

Faisal Ramzan; Claudio Sartori; Sergio Consoli; Diego Reforgiato Recupero

doi:10.3390/ai5020035

AI (May 2024)

Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment

Faisal Ramzan,
Claudio Sartori,
Sergio Consoli,
Diego Reforgiato Recupero

Affiliations

Faisal Ramzan: Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy
Claudio Sartori: Department of Computer Science and Engineering, University of Bologna, 40126 Bologna, Italy
Sergio Consoli: Joint Research Centre (DG JRC), European Commission, 1050 Brussels, Belgium
Diego Reforgiato Recupero: Department of Mathematics and Computer Science, University of Cagliari, 09124 Cagliari, Italy

DOI: https://doi.org/10.3390/ai5020035
Journal volume & issue: Vol. 5, no. 2
pp. 667 – 685

Abstract

Read online

Generating synthetic data is a complex task that necessitates accurately replicating the statistical and mathematical properties of the original data elements. In sectors such as finance, utilizing and disseminating real data for research or model development can pose substantial privacy risks owing to the inclusion of sensitive information. Additionally, authentic data may be scarce, particularly in specialized domains where acquiring ample, varied, and high-quality data is difficult or costly. This scarcity or limited data availability can limit the training and testing of machine-learning models. In this paper, we address this challenge. In particular, our task is to synthesize a dataset with similar properties to an input dataset about the stock market. The input dataset is anonymized and consists of very few columns and rows, contains many inconsistencies, such as missing rows and duplicates, and its values are not normalized, scaled, or balanced. We explore the utilization of generative adversarial networks, a deep-learning technique, to generate synthetic data and evaluate its quality compared to the input stock dataset. Our innovation involves generating artificial datasets that mimic the statistical properties of the input elements without revealing complete information. For example, synthetic datasets can capture the distribution of stock prices, trading volumes, and market trends observed in the original dataset. The generated datasets cover a wider range of scenarios and variations, enabling researchers and practitioners to explore different market conditions and investment strategies. This diversity can enhance the robustness and generalization of machine-learning models. We evaluate our synthetic data in terms of the mean, similarities, and correlations.

Published in AI

ISSN: 2673-2688 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/ai

About the journal

Abstract

Keywords