IEEE Access (Jan 2022)

Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping

  • Changhui Liu,
  • Sun Jin,
  • Donghong Wang,
  • Zichao Luo,
  • Jianbo Yu,
  • Binghai Zhou,
  • Changlin Yang

DOI
https://doi.org/10.1109/ACCESS.2020.3018911
Journal volume & issue
Vol. 10
pp. 91452 – 91465

Abstract

Read online

Imbalanced datasets are pervasive in classification tasks and would cause degradation of the performance of classifiers in predicting minority samples. Oversampling is effective in resolving the class imbalance problem. However, existing oversampling methods generally introduce noise examples into original datasets, especially when the datasets contain class overlapping regions. In this study, a new oversampling method named Constrained Oversampling is proposed to reduce noise generation in oversampling. This algorithm first extracts overlapping regions in the dataset. Then Ant Colony Optimization is applied to define the boundaries of minority regions. Third, oversampling under constraints is employed to synthesize new samples to get a balanced dataset. Our proposal distinguishes itself from other techniques by incorporating constraints in the oversampling process to inhibit noise generation. Experiments show that it outperforms various benchmark oversampling approaches. The explanation for the effectiveness of our method is given by studying the impact of class overlapping on imbalanced learning.

Keywords