Determining Bug Prioritization Using Feature Reduction and Clustering With Classification

Shahid Iqbal; Rashid Naseem; Salman Jan; Sami Alshmrany; Muhammad Yasar; Arshad Ali

doi:10.1109/ACCESS.2020.3035063

IEEE Access (Jan 2020)

Determining Bug Prioritization Using Feature Reduction and Clustering With Classification

Shahid Iqbal,
Rashid Naseem,
Salman Jan,
Sami Alshmrany,
Muhammad Yasar,
Arshad Ali

Affiliations

Shahid Iqbal: Department of Computer Science, City University of Science and Information Technology, Peshawar, Pakistan
Rashid Naseem: ORCiD; Department of IT and Computer Science, Pak-Austria Fachhochschule: Institute of Applied Sciences and Technology, Haripur, Pakistan
Salman Jan: ORCiD; Department of Computer Science, University of Peshawar, Peshawar, Pakistan
Sami Alshmrany: Faculty of Computer and Information Systems, Islamic University of Madinah, Medina, Saudi Arabia
Muhammad Yasar: Malaysian Institute of Information Technology, University of Kuala Lumpur, Kuala Lumpur, Malaysia
Arshad Ali: Faculty of Computer and Information Systems, Islamic University of Madinah, Medina, Saudi Arabia

DOI: https://doi.org/10.1109/ACCESS.2020.3035063
Journal volume & issue: Vol. 8
pp. 215661 – 215678

Abstract

Read online

Assigning accurate and timely priorities to bugs manually is resource consuming and effects addressing important bugs. In the existing work single feature is used which leads to information loss because bugs have a lot of features including “severity”, “component”, “operation system”, “owner”, “status”, “assigned to”, “summary” etc. In this research, the authors proposed an improved model based on problem title, severity, and component for bug prioritization. We converted these textual features to numeric features using Term Frequency Inverse Document Frequency. During conversion, 5591 new features are generated, which increase complexity and running time of algorithms. To minimize these aspects, non-negative Matrix Factorization (NMF) and Principal Component Analysis (PCA) algorithms are used. Our proposed model is a combination of feature reduction, clustering, and classification algorithms. Clustering is performed on all and reduced features. For clustering X-Mean and K-Mean algorithms are used. SVM and Naive Bayes classifiers are applied on all features, reduced features, and on clustered features. For experiments chromium, eclipse, net beans, mozilla, and free desktop datasets are used. Experimental results reveal better performance of model, both with all features and with reduced features in terms of precision, recall, f-score, and accuracy. Maximum improvement is achieved with reduced features. With all features chromium, eclipse, free desktop, mozilla and net beans achieved 22.46%, 8.32%, 30.93%, 25.79% and 37.78% respectively improvement in accuracy. With reduced features chromium, elipse, free desktop, mozilla, net beans achieved 14.64%, 8.81%, 33.22%, 34.37% and 41.01% accuracy respectively. Overall classification with clustering and reduced features performed better than classification on all features, classification with clustering on all features, and classification on reduced features. In all the approaches SVM classifier outperformed Naive Bayes in terms of precision, recall, f-score, and accuracy. On average maximum accuracy is achieved by SVM with NMF and X-Mean clustering.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords