Big Data Mining and Analytics (Feb 2025)
Desensitized Financial Data Generation Based on Generative Adversarial Network and Differential Privacy
Abstract
Artificial intelligence has been widely used in the financial field, such as credit risk assessment, fraud detection, and stock prediction. Training deep learning models requires a significant amount of data, but financial data often contains sensitive information, some of which cannot be disclosed. Acquiring large amounts of financial data for training deep learning models is a pressing issue that needs to be addressed. This paper proposes a Noise Visibility Function-Differential Privacy Generative Adversarial Network (NVF-DPGAN) model, which generates privacy preserving data similar to the original data, and can be applied to data augmentation for deep learning. This study conducts experiments using financial data from China Stock Market & Accounting Research (CSMAR) database. It compares the generated data with real data from various perspectives, including mean, probability density distribution, and correlation. The experimental results show that the two datasets exhibit similar characteristics. A time series forecasting model is trained on the generated data and the real data separately, and their prediction results are closely aligned. NVF-DPGAN model is feasible and practical in terms of financial data enhancement and privacy protection. This method can also be generalized to other fields, such as the privacy protection of medical data.
Keywords