Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology

Behroz Mirza; Danish Haroon; Behraj Khan; Ali Padhani; Tahir Q. Syed

doi:10.1109/ACCESS.2021.3071389

IEEE Access (Jan 2021)

Deep Generative Models to Counter Class Imbalance: A Model-Metric Mapping With Proportion Calibration Methodology

Behroz Mirza,
Danish Haroon,
Behraj Khan,
Ali Padhani,
Tahir Q. Syed

Affiliations

Behroz Mirza: School of Computing, National University of Computer and Emerging Science, Karachi, Pakistan
Danish Haroon: School of Computing, National University of Computer and Emerging Science, Karachi, Pakistan
Behraj Khan: ORCiD; School of Computing, National University of Computer and Emerging Science, Karachi, Pakistan
Ali Padhani: School of Computing, National University of Computer and Emerging Science, Karachi, Pakistan
Tahir Q. Syed: ORCiD; Institute of Business Administration, Karachi, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2021.3071389
Journal volume & issue: Vol. 9
pp. 55879 – 55897

Abstract

Read online

The most pervasive segment of techniques in managing class imbalance in machine learning are re-sampling-based methods. The emergence of deep generative models for augmenting the size of the under-represented class, prompts one to review the question of the suitability of the model chosen for data augmentation with the metric selected for the-goodness-of classification. This work defines this suitability by using newly-sampled data points from each generative model first to the degree of parity, and studying classification performance on a large set of metrics. We extend the investigation to different proportions of augmented data points for identifying the sensitivity of the metric to the degree of imbalance, leading to the discovery of an optimum proportion against the metric. The models used are GAN, VAE and RBM and the metrics include Precision, Recall, F1-Score, AUC, G-Mean and Balanced Accuracy. We offer a comparison of these models with the established class of data synthesizing counterparts on the aforementioned metrics. Deep generative models outperform the state-of-the-art on 5 metrics on multiple datasets and also comprehensively surpass the baselines. This work thereby recommends the following model-metric mappings: VAE for high Precision and F1-Score, RBM for high Recall and GAN for high AUC, G-Mean and Balanced Accuracy under various recommended proportions of the minority class.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords