IEEE Access (Jan 2019)
Analyses of Classifier’s Performance Measures Used in Software Fault Prediction Studies
Abstract
Assessing the quality of the software is both important and difficult. For this purpose, software fault prediction (SFP) models have been extensively used. However, selecting the right model and declaring the best out of multiple models are dependent on the performance measures. We analyze 14 frequently used, non-graphic classifier's performance measures used in SFP studies. These analyses would help machine learning practitioners and researchers in SFP to select the most appropriate performance measure for the models' evaluation. We analyze the performance measures for resilience against producing invalid values through our proposed plausibility criterion. After that, consistency and discriminancy analyses are performed to find the best out of the 14 performance measures. Finally, we draw the order of the selected performance measures from better to worse in both balance and imbalance datasets. Our analyses conclude that the F-measure and the G-mean1 are equally the best candidates to evaluate the SFP models with careful analysis of the result, as there is a risk of invalid values in certain scenarios.
Keywords