IEEE Access (Jan 2020)

SOTB: Semi-Supervised Oversampling Approach Based on Trigonal Barycenter Theory

  • Dingxiang Liu,
  • Shaojie Qiao,
  • Nan Han,
  • Tao Wu,
  • Rui Mao,
  • Yongqing Zhang,
  • Chang-An Yuan,
  • Yueqiang Xiao

DOI
https://doi.org/10.1109/ACCESS.2020.2980157
Journal volume & issue
Vol. 8
pp. 50180 – 50189

Abstract

Read online

The problem of classifying imbalanced data is one of the active research directions in machine learning and bioinformatics. The imbalance of data will greatly degrade the accuracy of classifiers. Good oversampling methods can improve the diversity and validity of new samples, which cannot only solve the imbalance problem of sample data, but also greatly improve the classification accuracy. In this study, we propose the trigonal barycenter theory and a semi-supervised oversampling method, called SOTB (Semi-supervised Oversampling method based on Trigonal Barycenter theory). SOBT works to: (1) construct the non-intersecting triangles based on Mahalanobis distance; (2) combine the semi-supervised sampling method with trigonal barycenter theory to oversample the positive samples, which can cope with the data imbalance problem without affecting the quality of data. Lastly, extensive experiments were conducted to verify the effectiveness of the proposed method. The results demonstrate that SOTB can improve the validity, diversity and rationality on the distribution of the newly generated samples as well as alleviate the phenomena of over-fitting which is popular in existing oversampling approaches. In particular, when compared with the state-of-the-art oversampling methods, the results show SOTB can achieve the best classification performance.

Keywords