Engineering Reports (Dec 2023)

Cross‐project defect prediction method based on genetic algorithm feature selection

  • Zhixi Hu,
  • Yi Zhu

DOI
https://doi.org/10.1002/eng2.12670
Journal volume & issue
Vol. 5, no. 12
pp. n/a – n/a

Abstract

Read online

Abstract With the continuous development of Internet technology, the role of software in life is increasing, and software defect prediction (SDP) is a key means to ensure software reliability. SDP is to predict the modules that may have defects in advance based on the historical data of software projects, and its purpose is to maximize the use of testing resources. However, in the actual development process, the project that needs to be predicted is often a new project for which there is little or no historical data. Therefore, how to use the massive data of other related projects to build a cross‐project software defect prediction (CPDP) model has received extensive attention from scholars. However, due to the differences in data distribution and class imbalance between different projects, the performance of CPDP is greatly affected. Therefore, on the basis of CPDP, this article proposes a feature selection method based on genetic algorithm (genetic algorithm feature selection, GAFS). GAFS mainly includes two stages: feature selection and ensemble training. In the feature selection stage, this article proposes a global search adaptive feature selection method based on genetic algorithm, which uses the integrated training results of candidate feature subsets on target data to migrate the optimal feature subset. In the ensemble training phase, the EasyEnsemble method is used to alleviate the class imbalance problem, multiple naive Bayesian classifiers are constructed, and then the final model is constructed through ensemble learning. In this article, F1‐score and MCC are used as the test indicator, and comparative experiments are carried out on AEEEM and Promise. The results show that compared with the five comparison methods, GAFS can improve the average F1‐score and MCC much more. For example, GAFS can improve the average F1‐score value by 38.9%, 31.6%, 35.1%, 22.0%, and 31.6%, respectively. In most cases, it can effectively improve the performance of the model and achieve better prediction results.

Keywords