IEEE Access (Jan 2024)

Hybrid Undersampling and Oversampling for Handling Imbalanced Credit Card Data

  • Maram Alamri,
  • Mourad Ykhlef

DOI
https://doi.org/10.1109/ACCESS.2024.3357091
Journal volume & issue
Vol. 12
pp. 14050 – 14060

Abstract

Read online

Recent developments in the use of credit cards for a range of daily life activities have increased credit card fraud and caused huge financial losses for individuals and financial institutions. Most credit card frauds are conducted online through illegal payment authorizations by data breaches, phishing, or scams. Many solutions have been suggested for this issue, but they all face the major challenge of building an effective detection model using highly imbalanced class data. Most sampling techniques used for class imbalance have limitations, such as overlapping and overfitting, which cause inaccurate learning and are slowed down by noisy features. Herein, a hybrid Tomek links BIRCH Clustering Borderline SMOTE (BCBSMOTE) sampling method is proposed to balance a highly skewed credit card transaction dataset. First, Tomek links were used to undersample majority instances and remove noise, and then BIRCH clustering was applied to cluster the data and oversample minority instances using B-SMOTE. The credit card fraud-detection model was run using a random forest (RF) classifier. The proposed method achieved a higher F1-score (85.20%) than the baseline sampling techniques tested for comparison. Because of the enormous number of credit card transactions, there was still a small false-positive rate. The proposed method improves the detection performance owing to the well-organized balancing of the dataset.

Keywords