Sistemasi: Jurnal Sistem Informasi (Nov 2024)

Optimization of the Naive Bayes Algorithm with SMOTETomek Combination for Imbalance Class Fraud Detection

  • Arief Tri Arsanto,
  • Arif Faizin,
  • Moch lutfi,
  • Zulfatun Nikmatus Saadah

DOI
https://doi.org/10.32520/stmsi.v13i6.4719
Journal volume & issue
Vol. 13, no. 6
pp. 2709 – 2721

Abstract

Read online

The use of credit cards in the modern era is increasing. Therefore, it is necessary to prevent it with the use of technology such as address verification systems (AVS), card verification methods (CVM), and personal identification Numbers (PIN). Dataset analysis needs to be carried out to analyze the history of transactions that have been carried out. In the fraud detection dataset, it can be seen that there are attributes that cause data imbalance. Class imbalance in a dataset is a significant problem in machine learning that can affect overall model performance. The number of majority samples is more significant in one class than the number of minority classes. This research used an oversampling approach using a combination of smote and tomek-link. The focus of this research is card fraud classification. Detection of imbalanced datasets or imbalanced classes is carried out using the Naive Bayes method as a classification algorithm. In addition, a combination of resampling techniques is also applied to overcome imbalanced classes in this dataset through the SMOTETomek approach. SMOTETomek is a method that reduces the number of samples by considering two adjacent data from the minority and majority classes. Meanwhile, from the problems above, the results of the performance of Naïve Bayes, which experienced issues with data imbalance in this study, a resampling method was proposed in the hope of improving the performance of the Naïve Bayes algorithm and in the results of the AUC ROC curve, the SMOTETomek method could improve the performance of the Naïve Bayes algorithm. The higher the ROC score. -AUC, the better the model performance in terms of its ability to differentiate between two classes, but the accuracy results do not experience a significant change.