Egyptian Informatics Journal (Sep 2025)

Analysis of classification metric behaviour under class imbalance

  • Jean-Pierre van Zyl,
  • Andries Petrus Engelbrecht

DOI
https://doi.org/10.1016/j.eij.2025.100711
Journal volume & issue
Vol. 31
p. 100711

Abstract

Read online

Class imbalance is the phenomenon defined as skewed target variable distributions in a dataset. In other words class imbalance occurs when a dataset has an unequal proportion of target variables assigned to the instances in the dataset. Although the level of class imbalance is simply an inherent property of a dataset, highly skewed class imbalances cause misleading performance evaluations of a classification model to be reported by certain evaluation metrics. This paper reviews the history of existing performance evaluation metrics for classification, and uses a normalisation process to create new variations of these existing metrics which are more robust to class imbalance. Conclusions about the performance of the analysed metrics are drawn by performing the first extensive global sensitivity analysis of classification metrics. A statistical analysis technique, i.e. analysis of variance, is used to analyse the robustness to class imbalance of the existing metrics and the proposed metrics. This paper finds that most performance evaluation metrics for classification problems are highly sensitive to class imbalance, while the newly proposed alternative metrics tend to be more robust to class imbalance.

Keywords