Human-Centric Intelligent Systems (Jul 2023)

Hybrid Defect Prediction Model Based on Counterfactual Feature Optimization

  • Wei Zheng,
  • Teng Fei Chen,
  • Mei Ting Hu,
  • Feng Yu Yang,
  • Xin Fan,
  • Peng Xiao

DOI
https://doi.org/10.1007/s44230-023-00034-2
Journal volume & issue
Vol. 3, no. 3
pp. 366 – 380

Abstract

Read online

Abstract Software defect prediction is critical to ensuring software quality. Researchers have worked on building various defect prediction models to improve the performance of defect prediction. Existing defect prediction models are mainly divided into two categories: models constructed based on artificial statistical features and models constructed based on semantic features. DP-CNN [Li J, He P, Zhu J, et al. Software defect prediction via convolutional neural network. In: 2017 IEEE international conference on software quality, reliability and security (QRS). IEEE, 2017; 318–328.] is one of the best defect prediction models, because it combines both artificial statistical features and semantic features, so its performance is greatly improved compared to traditional defect prediction models. This paper is based on the DP-CNN model and makes the following two improvements: first, using a new Struc2vec network representation technique to mine existing information between software modules, which specializes in learning node representations from structural identity and can further extract structural features associated with defects. Let the DP-CNN model once again incorporate the newly mined structural features. Then, this paper proposes a feature selection method based on counterfactual explanations, which can determine the importance score of each feature by the feature change rate of counterfactual samples. The origin of these feature importance scores is interpretable. Under the guidance of these interpretable feature importance scores, better feature subsets can be obtained and used to optimize artificial statistical features within the DP-CNN model. Based on the above methods, this paper proposes a new hybrid defect prediction model DPS-CNN-STR. Evaluating our model on six open source projects in terms of F1 score in defect prediction. Experimental results show that DPS-CNN-STR improves the state-of-the-art method by an average of 3.3%.

Keywords