IEEE Access (Jan 2025)
Maximal Information Coefficient-Based Undersampling Method for Highly-Imbalanced Learning
Abstract
Learning from highly-imbalanced datasets is still a big challenge in the field of machine learning because models created by general learning algorithms are weak in recognizing the samples from the minority class correctly. Undersampling is an alternative kind of methods to deal with imbalanced learning. In this paper, we propose a new undersampling method based on maximal information coefficient (including two algorithms MICU-1 and MICU-2) to rebalance the datasets. In order to evaluate the effectiveness of the method, 20 highly- imbalanced datasets are used for the benchmarks. Results show that compared with other undersampling methods, maximal information coefficient-based undersampling method are competitive in terms of G-mean and F-measure.
Keywords