Ilkom Jurnal Ilmiah (Aug 2023)

Decision Tree C4.5 Performance Improvement using Synthetic Minority Oversampling Technique (SMOTE) and K-Nearest Neighbor for Debtor Eligibility Evaluation

  • Edi Priyanto,
  • Enny Itje Sela,
  • Luther Alexander Latumakulita,
  • Noourul Islam

DOI
https://doi.org/10.33096/ilkom.v15i2.1676.373-381
Journal volume & issue
Vol. 15, no. 2
pp. 373 – 381

Abstract

Read online

Nowadays, information technology especially machine learning has been used to evaluate the feasibility of debtors. One of the challenges in this classification model is the occurrence of imbalanced datasets, especially in the German Credit Dataset. Another challenge is developing an optimal model for evaluating debtor eligibility. Based on these challenges, this study aims to develop an optimal model for evaluating debtor eligibility on the German Credit Dataset, using the decision trees, k-Nearest Neighbor (k-NN) and Synthetic Minority Oversampling Technique (SMOTE). SMOTE and k-NN is used to overcome challenges regarding imbalanced datasets. While the decision tree are applied to produce a debtor classification model. In general, the steps taken are preparing datasets, pre-processing data, dividing datasets, oversampling with SMOTE, and classification models using decision trees, and testing. Model performance evaluation is represented by accuracy values obtained from the confusion matrix and area under curve (AUC) values generated by the Receiver Operating Characteristic (ROC). Based on the tests that have been carried out, the best accuracy value in the test is obtained at 73.00% and the AUC value is 0.708, in parameters k = 3 and Max-Depth = 25. Based on the analysis produced, the proposed model can improve performance compared to if the dataset is not applied SMOTE.

Keywords