Journal of Big Data (Aug 2022)
The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey
Abstract
Abstract The existence of class imbalance in a dataset can greatly bias the classifier towards majority classification. This discrepancy can pose a serious problem for deep learning models, which require copious and diverse amounts of data to learn patterns and output classifications. Traditionally, data-level and algorithm-level techniques have been instrumental in mitigating the adverse effect of class imbalance. With the recent development and proliferation of Generative Adversarial Networks (GANs), researchers across a variety of disciplines have adapted the architecture of GANs and implemented them on imbalanced datasets to generate instances of the underrepresented class(es). Though the bulk of research has been centered on the application of this methodology in computer vision tasks, GANs are likewise being appropriated for use in tabular data, or data consisting of rows and columns with traditional structured data types. In this survey paper, we assess the methodology and efficacy of these modifications on tabular datasets, across domains such network traffic classification and financial transactions over the past seven years. We examine what methodologies and experimental factors have resulted in the greatest machine learning efficacy, as well as the research works and frameworks which have proven most influential in the development of the application of GANs in tabular data settings. Specifically, we note the prevalence of the CGAN architecture, the optimality of novel methods with CNN learners and minority-class sensitive measures such as F1 score, the popularity of SMOTE as a baseline technique, and the improved performance in the year-over-year use of GANs in imbalanced tabular datasets.
Keywords