IEEE Access (Jan 2022)
Holistic Parameter Optimization for Software Defect Prediction
Abstract
A software defect prediction (SDP) model identifies the defect-prone modules. Setting appropriate parameters in an SDP model is critical because it affects the model performance. In a recent study, parameters were automatically explored using an optimization algorithm. However, such studies did not explore all the parameters that could be handled in the SDP process from preprocessing to model building, but only optimized parameters in some modeling process steps, such as feature selection or model building. Our goal is to improve the model performance by optimizing parameters across the entire SDP process. For this, we propose a cost-sensitive decision tree based on harmony search (HS-CSDT). HS-CSDT uses a harmony search algorithm to simultaneously identify the optimal feature set, regularization technique, class weight, and decision tree hyperparameters. We compared HS-CSDT against the methods in related studies in terms of probability of detection, probability of false alarm, G-measure, and file inspection reduction in the evaluation of 28 open-source projects. The results of the effect size using Cohen’s d reveal that HS-CSDT provides a statistically better performance than methods in related work. Experimental results show that optimizing the identified parameters throughout the entire SDP modeling process by using the optimization algorithm helps improve the model performance. In summary, HS-CSDT shows excellent defect prediction performance by automatically allocating an appropriate parameter set according to the software project. Thus, the model can help effectively allocate limited quality assurance resources.
Keywords