IEEE Access (Jan 2020)

Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets

  • Christos K. Aridas,
  • Stamatis Karlos,
  • Vasileios G. Kanas,
  • Nikos Fazakis,
  • Sotiris B. Kotsiantis

DOI
https://doi.org/10.1109/ACCESS.2019.2961784
Journal volume & issue
Vol. 8
pp. 2122 – 2133

Abstract

Read online

In many real world classification tasks, all data classes are not represented equally. This problem, known also as the curse of class imbalanced in data sets, has a potential impact in the training procedure of a classifier by learning a model that will be biased in favor of the majority class. In this work at hand, an under-sampling approach is proposed, which leverages the usage of a Naive Bayes classifier, in order to select the most informative instances from the available training set, based on a random initial selection. The method starts by learning a Naive Bayes classification model on a small stratified initial training set. Afterwards, it iteratively teaches its base model with the instances that the model is most uncertain about and retrains it until some criteria are satisfied. The overall performance of the proposed method has been scrutinized through a rigorous experimental procedure, being tested using six multimodal data sets, as well as another forty-four standard benchmark data sets. The empirical results indicate that the proposed under-sampling method achieves comparable classification performance in contrast to other resampling techniques, regarding several proper metrics and having performed a suitable statistical testing procedure.

Keywords