Applied Sciences (Oct 2024)
Imbalanced Data Parameter Optimization of Convolutional Neural Networks Based on Analysis of Variance
Abstract
Classifying imbalanced data is important due to the significant practical value of accurately categorizing minority class samples, garnering considerable interest in many scientific domains. This study primarily uses analysis of variance (ANOVA) to investigate the main and interaction effects of different parameters on imbalanced data, aiming to optimize convolutional neural network (CNN) parameters to improve minority class sample recognition. The CIFAR-10 and Fashion-MNIST datasets are used to extract samples with imbalance ratios of 25:1, 15:1, and 1:1. To thoroughly assess model performance on imbalanced data, we employ various evaluation metrics, such as accuracy, recall, F1 score, P-mean, and G-mean. In highly imbalanced datasets, optimizing the learning rate significantly affects all performance metrics. The interaction between the learning rate and kernel size significantly impacts minority class samples in moderately imbalanced datasets. Through parameter optimization, the accuracy of the CNN model on the 25:1 highly imbalanced CIFAR-10 and Fashion-MNIST datasets improves by 14.20% and 5.19% compared to the default model and by 8.21% and 3.87% compared to the undersampling model, respectively, while also enhancing other evaluation metrics for minority classes.
Keywords