IMPROVING PERFORMANCE FOR IMBALANCED DATA CLASSIFICATION USING OVERSAMPLING AND CHARACTERISTICS OF EACH CLUSTER

Phan Anh Phong, Le Van Thanh

doi:10.56824/vujs.2024a054a

Tạp chí Khoa học (Sep 2024)

IMPROVING PERFORMANCE FOR IMBALANCED DATA CLASSIFICATION USING OVERSAMPLING AND CHARACTERISTICS OF EACH CLUSTER

Phan Anh Phong, Le Van Thanh

Affiliations

Phan Anh Phong, Le Van Thanh: Vinh University, Nghe An, Vietnam

DOI: https://doi.org/10.56824/vujs.2024a054a
Journal volume & issue: Vol. 53, no. 3A
pp. 5 – 15

Abstract

Read online

This paper proposes a method to enhance the effectiveness of classifying imbalanced data. The main contribution of the method is integrating the K-means clustering algorithm and the minority oversampling technique VCIR to generate synthetic samples that closely represent the actual data characteristics. Experimental results have shown that the proposed method performs better on several metrics than current popular methods for handling imbalanced data, such as SMOTE, Borderline-SMOTE, Kmeans-SMOTE, and SVM-SMOTE.

Published in Tạp chí Khoa học

ISSN: 1859-2228 (Print)
Publisher: Trường Đại học Vinh
Country of publisher: Viet Nam
LCC subjects: Technology; Social Sciences: Social sciences (General)
Website: https://vujs.vn/

About the journal

Abstract

Keywords