Journal of Management Science and Engineering (Dec 2022)

Improved hybrid resampling and ensemble model for imbalance learning and credit evaluation

  • Gang Kou,
  • Hao Chen,
  • Mohammed A. Hefni

Journal volume & issue
Vol. 7, no. 4
pp. 511 – 529

Abstract

Read online

A clustering-based undersampling (CUS) and distance-based near-miss method are widely used in current imbalanced learning algorithms, but this method has certain drawbacks. In particular, the CUS does not consider the influence of the distance factor on the majority of instances, and the near-miss method omits the inter-class(es) within the majority of samples. To overcome these drawbacks, this study proposes an undersampling method combining distance measurement and majority class clustering. Resampling methods are used to develop an ensemble-based imbalanced-learning algorithm called the clustering and distance-based imbalance learning model (CDEILM). This algorithm combines distance-based undersampling, feature selection, and ensemble learning. In addition, a cluster size-based resampling (CSBR) method is proposed for preserving the original distribution of the majority class, and a hybrid imbalanced learning framework is constructed by fusing various types of resampling methods. The combination of CDEILM and CSBR can be considered as a specific case of this hybrid framework. The experimental results show that the CDEILM and CSBR methods can achieve better performance than the benchmark methods, and that the hybrid model provides the best results under most circumstances. Therefore, the proposed model can be used as an alternative imbalanced learning method under specific circumstances, e.g., for providing a solution to credit evaluation problems in financial applications.

Keywords