Jisuanji kexue yu tansuo (Feb 2020)

Approach to Software Defect Features Selection Using Extended Bayesian Information Criterion

  • TU Jiping, QIAN Ye, WANG Wei, FAN Daoyuan, ZHANG Hanyu

DOI
https://doi.org/10.3778/j.issn.1673-9418.1810047
Journal volume & issue
Vol. 14, no. 2
pp. 215 – 235

Abstract

Read online

Using a large number of metrics to establish a software defect prediction model may affect the performance of the prediction model because of unrelated metrics. Feature selection in defect prediction selects a certain dimension of partial defect data to build prediction model, which can achieve the aim of improving the performance of the model, compressing feature dimensions, improving the accuracy of the prediction model, reducing the complexity of the prediction model, and saving computing resources. The traditional feature ranking methods only evaluate the influence of a single feature on the class label, which has low effectiveness; feature subset selection methods need to evaluate all feature subsets, which consumes computing resources, meanwhile, feature subset selection methods tend to select many features. Therefore, this paper proposes a feature selection method based on extended Bayesian information criterion (EBIC-FS), which can make linear regression of the data and select the feature subset with the lowest sum of residuals and less feature dimensions. Experiments are conducted on benchmark datasets M&R and Promise. The results show that the method can compress the dimension of features effectively, and the performance of the prediction model is greatly improved compared with 5 baseline methods.

Keywords