IEEE Access (Jan 2022)

Cross-Project Software Defect Prediction Based on Class Code Similarity

  • Wanzhi Wen,
  • Chenqiang Shen,
  • Xiaohong Lu,
  • Zhixian Li,
  • Haoren Wang,
  • Ruinian Zhang,
  • Ningbo Zhu

DOI
https://doi.org/10.1109/ACCESS.2022.3211401
Journal volume & issue
Vol. 10
pp. 105485 – 105495

Abstract

Read online

Software defect prediction techniques can help software developers find software defects as soon as possible. It can also reduce the cost of software development. This technique usually predicts the target project through the entire source project. However, the data distribution difference between the entire source project and the target project is generally large, so the software defect prediction accuracy is not high. we propose a cross-project software defect prediction technique based on class code similarity CCS-CPDP. Firstly, this technique converts the code set extracted by AST(Abstract Syntax Tree) into a vector set through the DTI (Doc2Bow and TF-IDF) strategy; Secondly, the similarity will be calculated between the vector set of target projects and training projects; Finally, according to the principle of the majority decision subordinate category in KNN, the number of most similar class instances of the training project is determined, the source project is refined by selecting the class instance, and then software defects are predicted and evaluated. We compared CCS-CPDP with softawre defect prediction methods based on four traditional classification models (KNN, Random Forest, Naive Bayes, and Logistic Regression). Experimental results show that CCS-CPDP can improve the effectiveness of CPDP in terms of recall and f1-score.

Keywords