Electronic Research Archive (Mar 2023)
A selective evolutionary heterogeneous ensemble algorithm for classifying imbalanced data
Abstract
Learning from imbalanced data is a challenging task, as with this type of data, most conventional supervised learning algorithms tend to favor the majority class, which has significantly more instances than the other classes. Ensemble learning is a robust solution for addressing the imbalanced classification problem. To construct a successful ensemble classifier, the diversity of base classifiers should receive specific attention. In this paper, we present a novel ensemble learning algorithm called Selective Evolutionary Heterogeneous Ensemble (SEHE), which produces diversity by two ways, as follows: 1) adopting multiple different sampling strategies to generate diverse training subsets and 2) training multiple heterogeneous base classifiers to construct an ensemble. In addition, considering that some low-quality base classifiers may pull down the performance of an ensemble and that it is difficult to estimate the potential of each base classifier directly, we profit from the idea of a selective ensemble to adaptively select base classifiers for constructing an ensemble. In particular, an evolutionary algorithm is adopted to conduct the procedure of adaptive selection in SEHE. The experimental results on 42 imbalanced data sets show that the SEHE is significantly superior to some state-of-the-art ensemble learning algorithms which are specifically designed for addressing the class imbalance problem, indicating its effectiveness and superiority.
Keywords