Journal of King Saud University: Computer and Information Sciences (Dec 2024)
An oversampling FCM-KSMOTE algorithm for imbalanced data classification
Abstract
In recent years, imbalanced data classification has emerged as a challenging task. To address this issue, we propose a novel oversampling method named FCM-KSMOTE. The algorithm initially performs a density-based fuzzy clustering on the data, then iterates to partition regions and perform oversampling inside each cluster. Secondly, it merges the clusters and conducts noise detection to obtain a balanced dataset. Finally, we conducted the experiments on 19 public datasets and 3 synthetic datasets. Six evaluation metrics of Recall, Accuracy, G-mean, Specificity, AUC and F1-Score were used in the experiments. The experimental results demonstrate that our method can significantly improve the recognition rate of the minority class while maintaining high accuracy for the majority class. Particularly with the RF classifier, our method ranks first in all evaluation metrics, with a Recall difference of up to 0.2 compared to the least performing method, demonstrating its substantial performance advantage.