Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data

Ling Wang; Hongqiao Wang; Guangyuan Fu

doi:10.1109/ACCESS.2020.3046604

IEEE Access (Jan 2021)

Multiple Kernel Learning With Minority Oversampling for Classifying Imbalanced Data

Ling Wang,
Hongqiao Wang,
Guangyuan Fu

Affiliations

Ling Wang: ORCiD; Department of Information Engineering, Rocket Force University of Engineering, Xi’an, China
Hongqiao Wang: ORCiD; Department of Information Engineering, Rocket Force University of Engineering, Xi’an, China
Guangyuan Fu: ORCiD; Department of Information Engineering, Rocket Force University of Engineering, Xi’an, China

DOI: https://doi.org/10.1109/ACCESS.2020.3046604
Journal volume & issue: Vol. 9
pp. 565 – 580

Abstract

Read online

Class imbalance problems, developed due to the sampling bias or measurement error, occur frequently in real-world pattern classification tasks. The traditional classifiers focus on the overall classification accuracy and ignore the minority class, which may degrade the classification performance. However, existing oversampling algorithms generally make specific assumptions to balance the class size and do not sufficiently consider irregularities present in imbalanced data. As a result, these methods can perform well only on certain benchmarks. In this paper, by incorporating minority oversampling and cost-sensitive learning, we propose multiple kernel learning with minority oversampling (MKLMO), for efficiently handling the class imbalance problem with small disjuncts, overlapping, and nonlinear shape. Unlike existing methods where oversampling of the minority class is performed first and then a standard classifier is deployed on the rebalanced data, the proposed MKLMO generates synthetic instances and trains classifier synchronously in the same feature space. Specially, we define a distance metric in the optimal feature space by multiple kernel learning and use kernel trick to expand the original Gram matrix. Moreover, we assign different weights to instances, based on the imbalance ratio, for reducing the bias of the classifier towards the majority class. In order to evaluate the proposed MKLMO method, several experiments are performed with nine artificial and twenty-one real-world datasets. The experimental results show that our algorithm outperforms other baseline algorithms significantly in terms of the assessment metric geometric mean (G-mean), especially in the presence of data irregularities.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords