IEEE Access (Jan 2024)
An Adaptive Safe-Region Diversity Oversampling Algorithm for Imbalanced Classification
Abstract
The challenge of imbalanced data classification stems from the uneven distribution of data across classes, which is a formidable obstacle for traditional classifiers. Although numerous methods have been proposed to address this problem, it is widely recognized that the artificial generation of instances through oversampling methods is a more effective and versatile strategy for balancing the class distribution. We identify that existing oversampling methods are susceptible to generating unnecessary and noisy instances in complex imbalanced scenarios. In light of this, a novel approach called Adaptive Safe-Region Diversity Oversampling (ASRDO) is introduced to tackle difficulties in imbalanced learning. ASRDO starts by calculating the distance from each minority class instance to its nearest majority class instances. Utilizing this distance as a radius, it defines a safe hyperspherical sampling region for each minority instance. The algorithm then assigns weights to minority instances based on the density within their respective sampling regions and the average distance to k nearest majority instances. Finally, It randomly selects two instances from the k nearest minority instances, generates a new direction vector by linear combination, and synthesizes minority instances along this direction vector within the sampling region. Experimental results show a significant performance improvement of the proposed method compared to prevalent oversampling methods on 32 public datasets. A Python implementation of ASRDO is provided for reference.
Keywords