IEEE Access (Jan 2021)

An Over Sampling Method of Unbalanced Data Based on Ant Colony Clustering

  • Gao Yang,
  • Liu Qicheng

DOI
https://doi.org/10.1109/ACCESS.2021.3114443
Journal volume & issue
Vol. 9
pp. 130990 – 130996

Abstract

Read online

Aiming at the low classification accuracy of unbalanced data sets, an improved SMOTE over-sampling algorithm ACC-SMOTE (Ant Colony Clustering Synthetic Minority Oversampling Technology) based on ant colony clustering is proposed. On the one hand, the improved ant colony clustering algorithm is used to divide a small number of samples into different sub-clusters, fully considered the imbalance between inter-cluster and intra-cluster data, and SMOTE algorithm is used to oversample the samples according to the proportion of sub-clusters, to reduce the imbalance of intra-class data. On the other hand, Tomek Links data cleaning technology is used to correct the oversampled samples in time, the quality of synthetic samples is guaranteed by eliminating noise in data sets and overlapping samples generated by sampling methods. The training data set and the test data set used in this paper are both UCI data sets. The experimental results show that this algorithm can significantly improve the classification accuracy of a few classes, thus improving the classification performance of the classifier.

Keywords