Analysis of Feature Selection Methods in Software Defect Prediction Models

Misbah Ali; Tehseen Mazhar; Tariq Shahzad; Yazeed Yasin Ghadi; Syed Muhammad Mohsin; Syed Muhammad Abrar Akber; Mohammed Ali

doi:10.1109/ACCESS.2023.3343249

IEEE Access (Jan 2023)

Analysis of Feature Selection Methods in Software Defect Prediction Models

Misbah Ali,
Tehseen Mazhar,
Tariq Shahzad,
Yazeed Yasin Ghadi,
Syed Muhammad Mohsin,
Syed Muhammad Abrar Akber,
Mohammed Ali

Affiliations

Misbah Ali: ORCiD; Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan
Tehseen Mazhar: Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan
Tariq Shahzad: ORCiD; Department of Computer Sciences, COMSATS University Islamabad, Sahiwal Campus, Sahiwal, Pakistan
Yazeed Yasin Ghadi: ORCiD; Department of Computer Science and Software Engineering, Al Ain University, Abu Dhabi, United Arab Emirates
Syed Muhammad Mohsin: ORCiD; Department of Computer Science, COMSATS University Islamabad, Islamabad, Pakistan
Syed Muhammad Abrar Akber: ORCiD; Department of Computer Graphics, Vision and Digital Systems, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
Mohammed Ali: ORCiD; Department of Computer Science, Applied College, King Khalid University, Abha, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2023.3343249
Journal volume & issue: Vol. 11
pp. 145954 – 145974

Abstract

Read online

Improving software quality by proactively detecting potential defects during development is a major goal of software engineering. Software defect prediction plays a central role in achieving this goal. The power of data analytics and machine learning allows us to focus our efforts where they are needed most. A key factor in the success of software fault prediction is selecting relevant features and reducing data dimensionality. Feature selection methods contribute by filtering out the most critical attributes from a plethora of potential features. These methods have the potential to significantly improve the accuracy and efficiency of fault prediction models. However, the field of feature selection in the context of software fault prediction is vast and constantly evolving, with a variety of techniques and tools available. Based on these considerations, our systematic literature review conducts a comprehensive investigation of feature selection methods used in the context of software fault prediction. The research uses a refined search strategy involving four reputable digital libraries, including IEEE Explore, Science Direct, ACM Digital Library, and Springer Link, to provide a comprehensive and exhaustive review through a rigorous analysis of 49 selected primary studies from 2014. The results highlight several important issues. First, there is a prevalence of filtering and hybrid feature selection methods. Second, single classifiers such as Naïve Bayes, Support Vector Machine, and Decision Tree, as well as ensemble classifiers such as Random Forest, Bagging, and AdaBoost are commonly used. Third, evaluation metrics such as area under the curve, accuracy, and F-measure are commonly used for performance evaluation. Finally, there is a clear preference for tools such as WEKA, MATLAB, and Python. By providing insights into current trends and practices in the field, this study offers valuable guidance to researchers and practitioners to make informed decisions to improve software fault prediction models and contribute to the overall improvement of software quality.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords