Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abdullateef O. Balogun; Shuib Basri; Saipunidzam Mahamad; Said J. Abdulkadir; Malek A. Almomani; Victor E. Adeyemo; Qasem Al-Tashi; Hammed A. Mojeed; Abdullahi A. Imam; Amos O. Bajeh

doi:10.3390/sym12071147

Symmetry (Jul 2020)

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abdullateef O. Balogun,
Shuib Basri,
Saipunidzam Mahamad,
Said J. Abdulkadir,
Malek A. Almomani,
Victor E. Adeyemo,
Qasem Al-Tashi,
Hammed A. Mojeed,
Abdullahi A. Imam,
Amos O. Bajeh

Affiliations

Abdullateef O. Balogun: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Shuib Basri: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Saipunidzam Mahamad: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Said J. Abdulkadir: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Malek A. Almomani: Department of Software Engineering, The World Islamic Sciences and Education University, Amman 11947, Jordan
Victor E. Adeyemo: School of Built Environment, Engineering and Computing, Leeds Beckett University, Headingley Campus, Leeds LS6 3QS, UK
Qasem Al-Tashi: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Hammed A. Mojeed: Department of Computer Science, University of Ilorin, Ilorin, Ilorin 1515, Nigeria
Abdullahi A. Imam: Department of Computer and Information Science, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Perak 32610, Malaysia
Amos O. Bajeh: Department of Computer Science, University of Ilorin, Ilorin, Ilorin 1515, Nigeria

DOI: https://doi.org/10.3390/sym12071147
Journal volume & issue: Vol. 12, no. 7
p. 1147

Abstract

Read online

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

Published in Symmetry

ISSN: 2073-8994 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/symmetry/

About the journal

Abstract

Keywords