Evaluation of Classifiers in Software Fault-Proneness Prediction

F. Karimian; S. M. Babamir

doi:10.22044/jadm.2016.825

Journal of Artificial Intelligence and Data Mining (Jul 2017)

Evaluation of Classifiers in Software Fault-Proneness Prediction

F. Karimian,
S. M. Babamir

Affiliations

F. Karimian: Department of Computer Engineering, University of Kashan, Kashan, Iran.
S. M. Babamir: Department of Computer Engineering, University of Kashan, Kashan, Iran.

DOI: https://doi.org/10.22044/jadm.2016.825
Journal volume & issue: Vol. 5, no. 2
pp. 149 – 167

Abstract

Read online

Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one can classify software modules into fault-prone and non-fault-prone ones. To make such a classification, we investigated into 17 classifier methods whose features (attributes) are software metrics (39 metrics) and instances (software modules) of mining are instances of 13 datasets reported by NASA. However, there are two important issues influencing our prediction accuracy when we use data mining methods: (1) selecting the best/most influent features (i.e. software metrics) when there is a wide diversity of them and (2) instance sampling in order to balance the imbalanced instances of mining; we have two imbalanced classes when the classifier biases towards the majority class. Based on the feature selection and instance sampling, we considered 4 scenarios in appraisal of 17 classifier methods to predict software fault-prone modules. To select features, we used Correlation-based Feature Selection (CFS) and to sample instances we did Synthetic Minority Oversampling Technique (SMOTE). Empirical results showed that suitable sampling software modules significantly influences on accuracy of predicting software reliability but metric selection has not considerable effect on the prediction.

Published in Journal of Artificial Intelligence and Data Mining

ISSN: 2322-5211 (Print); 2322-4444 (Online)
Publisher: Shahrood University of Technology
Country of publisher: Iran, Islamic Republic of
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: http://jad.shahroodut.ac.ir/

About the journal

Abstract

Keywords