Tạp chí Khoa học (Sep 2024)
IMPROVING PERFORMANCE FOR IMBALANCED DATA CLASSIFICATION USING OVERSAMPLING AND CHARACTERISTICS OF EACH CLUSTER
Abstract
This paper proposes a method to enhance the effectiveness of classifying imbalanced data. The main contribution of the method is integrating the K-means clustering algorithm and the minority oversampling technique VCIR to generate synthetic samples that closely represent the actual data characteristics. Experimental results have shown that the proposed method performs better on several metrics than current popular methods for handling imbalanced data, such as SMOTE, Borderline-SMOTE, Kmeans-SMOTE, and SVM-SMOTE.
Keywords