Jisuanji kexue yu tansuo (Mar 2020)
Imbalance Classification Based on Informative Instances Selection
Abstract
Class imbalance is a common challenge issue in practical classification problem for traditional models. Due to traditional learning algorithms can not sufficiently learn the hidden patterns from the minority classes and may be biased towards majority classes, thus minority instances are usually misclassified into majority instances. Moreover, redundant data and noise data in the dataset can also cause problems for the classifier. To deal with the above problems, this paper proposes a new imbalance classification framework SSIC. The framework fully considers the statistical properties of dataset, adaptively selects valuable instances from the different classes, and combines cost-sensitive learning to construct an imbalance classifier. Firstly, SSIC constructs several balanced data subsets by combining partial majority-class instances and all minority-class instances. On each subset, SSIC sufficiently takes advantage of the characteristics of data to extract the discriminative high-level features and adaptively selects the impor-tant samples, so that the redundant and noise data can be removed. Secondly, SSIC introduces a cost-sensitive support vector machine (SVM) by automatically assigning proper weight on each instance so that the minority class can be treated as equal as the majority class.
Keywords