Journal of King Saud University: Computer and Information Sciences (Sep 2022)

RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification

  • Ahmed Arafa,
  • Nawal El-Fishawy,
  • Mohammed Badawy,
  • Marwa Radad

Journal volume & issue
Vol. 34, no. 8
pp. 5059 – 5074

Abstract

Read online

Machine learning classifiers perform well on balanced datasets. Unfortunately, a lot of the real-world data sets are naturally imbalanced. So, imbalanced classification is a serious problem in machine learning. The imbalanced class distribution misleads classifiers from correctly classifying the minor class. This paper introduces Reduced Noise-SMOTE (RN-SMOTE) for pre-processing imbalanced data. RN-SMOTE firstly, oversamples the training data using SMOTE which introduces noisy oversampled synthetic instances in the minority class. Then, applying DBSCAN to detect and remove noise. Next, the clean artificial instances are combined with the original data. Finally, RN-SMOTE applies SMOTE again to rebalance the dataset before introducing it to the underlying classifier. RN-SMOTE is evaluated using 9 different classifiers and 9 different imbalanced datasets with different imbalance ratios and five of them are used for outlier detection. The results proved that the performance of the classifiers has been improved with RN-SMOTE and outperformed the performance with original data and SMOTE with percentage based on the classifier, dataset and evaluation metric. Also, performance of RN-SMOTE has been compared to the performance of the current state of art and resulted in an increase up to 37.41%, 23.28%, 13.95% and 9.07% in terms of Recall, F1, Precision and Accuracy for RN-SMOTE.

Keywords