Journal of Algorithms & Computational Technology (Nov 2023)

Addressing the class imbalance in tabular datasets from a generative adversarial network approach in supervised machine learning

  • Máximo E Sánchez-Gutiérrez,
  • Pedro P González-Pérez

DOI
https://doi.org/10.1177/17483026231215186
Journal volume & issue
Vol. 17

Abstract

Read online

One common issue with datasets used for supervised classification tasks is data imbalance or the unequal distribution of classes within a dataset. The class imbalance may cause biased machine learning models to favor the dominant class, misclassifying the minority class. Specific techniques can be employed to deal with the issue of class imbalance, including resampling by oversampling or undersampling and ensemble approaches. Besides, generative adversarial networks, a deep learning technique for building generative models, offer an alternative machine learning technique that is particularly well suited to address the class imbalance problem. This work introduces a machine learning-based approach to deal with the class imbalance in a cancer intracellular signaling dataset produced by a verified and validated computer simulation. Specifically, we use synthetic data generation to increase and balance the dataset generated by the computational simulation. The used approach simulates the oversampling method by employing a generative adversarial network to produce new examples for the minority class. Subsequently, we applied supervised machine learning methods, such as the K-NN algorithm, to assess whether or not the classification accuracy improved relative to the unbalanced dataset. The results presented in this work have shown an accuracy increase in the classification of patterns belonging to the minority class, with an improvement of 24.5 % .