Alexandria Engineering Journal (Jan 2025)
Cross-project software defect prediction based on the reduction and hybridization of software metrics
Abstract
Cross-project defect prediction (CPDP) plays an essential role in identifying potential defects in target projects, especially those with limited historical data, using relevant information from similar source projects. The current studies focused on three main types of software metrics for CPDP: static metrics, code-change metrics, and semantic features. However, these existing CPDP studies encounter two primary challenges: class overlap due to reduced feature dimensions and multicollinearity from integrating various software metrics. To address these challenges, we propose a CPDP model based on both reduction and hybridization techniques (RH-CPDP). The proposed model uses hybrid deep neural networks as a hybridization technique to combine the essential metrics from all metric categories, addressing the issue of class overlap to enhance prediction model efficiency. Principal component analysis (PCA) was used as a reduction method to keep the number of metrics used small, focusing on influential relationships among metrics and fault proneness and avoiding the multicollinearity problem. The experimental analysis conducted using nine open-source projects from the PROMISE dataset demonstrates that RH-CPDP surpasses current CPDP methods (TCSBoost, TPTL, DA-KTSVMO, DBN, and 3SW-MSTL) regarding area under the curve (AUC) and F1-measure. These findings highlight the effectiveness of RH-CPDP in improving the performance of CPDP techniques.