IEEE Access (Jan 2024)

FADA-SMOTE-Ms: Fuzzy Adaptative Smote-Based Methods

  • Roudani Mohammed,
  • El Moutaouakil Karim

DOI
https://doi.org/10.1109/ACCESS.2024.3480848
Journal volume & issue
Vol. 12
pp. 158742 – 158765

Abstract

Read online

The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known methods to solve the unequal class distribution problem in imbalanced datasets. However, it has three shortcomings: (1) it may cause the over-generalization problem due to oversampling of noisy samples, (2) over-sampling of uninformative samples, and (3) increasing the overlaps between different classes around the class boundaries. Different approaches SMOTE based have been proposed to handle these problems, but most of them implement hyperparameters and tend to generate noise because the synthetic sample is generated, randomly, in the area delimited by current random minority data. In this research, an improved SMOTE-based method, namely Fuzzy-ADAptative-SMOTE-Based-Methods (FADA-SOMTE-Ms), which targets all three problems at the same time, is introduced. In this regard, the $\alpha $ -SMOTE is chosen in such a way that the synthetic data is as far as possible from the two closest majority data. More precisely, this method processes into six steps: (a) clustering minority class into k groups (b) selecting a safe region (c) selecting random two minority data, (d) finding the M closest majority data to these minority data using original membership functions based on Fuzzy mean and flirting results, (e) finding the $\alpha $ -SMOTE producing a synthetic data as close as possible to the minority class and as far as possible from the M majority data by solving a very simple multi-objective mathematical optimization model, and (f) using SMOTE to generate synthetic samples using optimal $\alpha $ -SMOTE. FADA-SOMTE-Ms is evaluated using 5 classifiers, 21 unbalanced datasets, and it’s compared to 8 oversampling methods using three performance measures. FADA-SOMTE-Ms consistently outperforms other popular oversampling methods.

Keywords