Software Defect Prediction Based on Optimized Machine Learning Models: A Comparative Study

Muhammad Zain Fawwaz Nuruddin Siswantoro; Umi Laili Yuhana

doi:10.34148/teknika.v12i2.634

Teknika (Jun 2023)

Software Defect Prediction Based on Optimized Machine Learning Models: A Comparative Study

Muhammad Zain Fawwaz Nuruddin Siswantoro,
Umi Laili Yuhana

Affiliations

Muhammad Zain Fawwaz Nuruddin Siswantoro: Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Jawa Timur
Umi Laili Yuhana: Department of Informatics, Institut Teknologi Sepuluh Nopember, Surabaya, Jawa Timur

DOI: https://doi.org/10.34148/teknika.v12i2.634
Journal volume & issue: Vol. 12, no. 2

Abstract

Read online

Software defect prediction is crucial used for detecting possible defects in software before they manifest. While machine learning models have become more prevalent in software defect prediction, their effectiveness may vary based on the dataset and hyperparameters of the model. Difficulties arise in determining the most suitable hyperparameters for the model, as well as identifying the prominent features that serve as input to the classifier. This research aims to evaluate various traditional machine learning models that are optimized for software defect prediction on NASA MDP (Metrics Data Program) datasets. The datasets were classified using k-nearest neighbors (k-NN), decision trees, logistic regression, linear discriminant analysis (LDA), single hidden layer multilayer perceptron (SHL-MLP), and Support Vector Machine (SVM). The hyperparameters of the models were fine-tuned using random search, and the feature dimensionality was decreased by utilizing principal component analysis (PCA). The synthetic minority oversampling technique (SMOTE) was implemented to oversample the minority class in order to correct the class imbalance. k-NN was found to be the most suitable for software defect prediction on several datasets, while SHL-MLP and SVM were also effective on certain datasets. It is noteworthy that logistic regression and LDA did not perform as well as the other models. Moreover, the optimized models outperform the baseline models in terms of classification accuracy. The choice of model for software defect prediction should be based on the specific characteristics of the dataset. Furthermore, hyperparameter tuning can improve the accuracy of machine learning models in predicting software defects.

Published in Teknika

ISSN: 2549-8037 (Print); 2549-8045 (Online)
Publisher: Center for Research and Community Service, Institut Informatika Indonesia Surabaya
Country of publisher: Indonesia
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: http://ejournal.ikado.ac.id/index.php/teknika/

About the journal

Abstract

Keywords