IEEE Access (Jan 2023)

An Adaptive and Robust Method for Oriented Oversampling With Spatial Information for Imbalanced Noisy Datasets

  • Yi Deng,
  • Mingyong Li

DOI
https://doi.org/10.1109/ACCESS.2023.3329560
Journal volume & issue
Vol. 11
pp. 122610 – 122624

Abstract

Read online

Imbalanced datasets have a large negative impact on the classifiers, biasing the classification results towards the majority class. Since imbalanced data distribution is an inevitable and significant challenge in the real world, many variants of SMOTE have been proposed. However, current oversampling methods still need improvement because they rely on hyperparameter optimization, overgeneralize due to emphasizing specific synthetic regions, randomly synthesize samples or suffer from noise performance degradation. To overcome the above problems, we propose an adaptive and robust method (OOSI) for oriented oversampling with spatial information to deal with imbalanced noisy datasets. OOSI is a rare adaptive and effective oversampling method that can fill the gaps of existing methods through dataset-specific spatial partitioning and information quantization, three-stage noise suppression, and spatially-informed generation path improvement. Firstly, a specific and adaptive clustering space is adaptively derived through the data space division of the characteristics of datasets. Then, all minority clusters are assigned a reasonable number of synthetic samples to simultaneously address intra- and inter-class imbalances by integrating the cluster samples’ intra-cluster sparsity and the multi-class density information. After differentiating and identifying the noise, oriented weights are assigned based on the multi-class information level to guide the enhancement of the generation path of the synthetic samples and prevent the generation of extra noisy and overlapping samples. Extensive experiments demonstrate that the proposed algorithm outperforms 11 prominent oversampling algorithms on 11 real-world datasets with varying noise levels.

Keywords