Applied Computational Intelligence and Soft Computing (Jan 2024)
Boosting Software Fault Prediction: Addressing Class Imbalance With Enhanced Ensemble Learning
Abstract
Software fault prediction (SFP) is a crucial aspect of software engineering, aiding in the early identification of potential defects. This proactive approach significantly contributes to enhancing software quality and reliability. However, a common challenge in SFP is class imbalance (CI). Ensemble learning (EL) is a powerful strategy for refining SFP models in object-oriented systems with imbalanced data and improving sensitivity to minority classes. This study aimed to improve the effectiveness of ensemble classes in SFP within object-oriented systems, tackling the challenges associated with imbalanced data. It focuses on enhancing the performance of three ensemble classifiers, BalancedBagging, RUSBoost, and EasyEnsemble, explicitly designed for imbalanced datasets. In Enhanced_BalancedBagging (E_BB) and ROSBoost, random undersampling (RUS) is substituted with random oversampling (ROS). Meanwhile, Enhanced_EasyEnsemble (E_EE) replaces RUS with ROS and AdaBoost with XGBoost. The experimental results demonstrate the superior performance of E_BB, ROSBoost, and E_EE over their base models, achieving the highest F-measure, balanced accuracy, and AUC. Statistical tests, such as the Wilcoxon signed-rank test, provide robust support for the enhanced models, highlighting their practical significance through substantial improvements in F-measure and AUC, as indicated by low negative rank sums and large effect sizes.