An oversampling FCM-KSMOTE algorithm for imbalanced data classification

Hongfang Zhou; Jiahao Tong; Yuhan Liu; Kangyun Zheng; Chenhui Cao

Journal of King Saud University: Computer and Information Sciences (Dec 2024)

An oversampling FCM-KSMOTE algorithm for imbalanced data classification

Hongfang Zhou,
Jiahao Tong,
Yuhan Liu,
Kangyun Zheng,
Chenhui Cao

Affiliations

Hongfang Zhou: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China; Shaanxi Key Laboratory of Network Computing and Security Technology, Xi’an 710048, China; Corresponding author at: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China.
Jiahao Tong: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Yuhan Liu: School of Finance, Hebei University of Economics and Business, Shijiazhuang 050061, China
Kangyun Zheng: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Chenhui Cao: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China

Journal volume & issue: Vol. 36, no. 10
p. 102248

Abstract

Read online

In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-computer-and-information-sciences/

About the journal

Abstract

Keywords