IEEE Access (Jan 2024)
Heterogeneous Cross-Project Defect Prediction Using Encoder Networks and Transfer Learning
Abstract
Heterogeneous cross-project defect prediction (HCPDP) aims to predict defects in new software projects using defect data from previous software projects where the source and target projects have some different metrics. Most existing methods only find linear relationships in the software defect datasets. Additionally, these methods use multiple defect datasets from different projects as source datasets. In this paper, we propose a novel method called heterogeneous cross-project defect prediction using encoder networks and transfer learning (ENTL). ENTL uses encoder networks to extract the important features from source and target datasets. Also, to minimize the negative transfer during transfer learning, we used an augmented dataset that contains pseudo-labels and the source dataset. Additionally, we have used a single dataset to train the model. To evaluate the performance of the ENTL approach, 16 datasets from four publicly available software defect projects were used. Furthermore, we compared the proposed method with four HCPDP methods namely EGW, HDP_KS, CTKCCA and EMKCA, and one WPDP method from existing literature. The proposed method on average outperforms the baseline methods in terms of PD, PF, F1-score, G-mean and AUC.
Keywords