Inheritance metrics feats in unsupervised learning to classify unlabeled datasets and clusters in fault prediction

Syed Rashid Aziz; Tamim Ahmed Khan; Aamer Nadeem

doi:10.7717/peerj-cs.722

PeerJ Computer Science (Oct 2021)

Inheritance metrics feats in unsupervised learning to classify unlabeled datasets and clusters in fault prediction

Syed Rashid Aziz,
Tamim Ahmed Khan,
Aamer Nadeem

Affiliations

Syed Rashid Aziz: Department of Software Engineering, Bahria University, Islamabad, Pakistan
Tamim Ahmed Khan: Department of Software Engineering, Bahria University, Islamabad, Pakistan
Aamer Nadeem: Department of Software Engineering, Capital University of Science & Technology, Islamabad, Pakistan

DOI: https://doi.org/10.7717/peerj-cs.722
Journal volume & issue: Vol. 7
p. e722

Abstract

Read online Read online

Fault prediction is a necessity to deliver high-quality software. The absence of training data and mechanism to labeling a cluster faulty or fault-free is a topic of concern in software fault prediction (SFP). Inheritance is an important feature of object-oriented development, and its metrics measure the complexity, depth, and breadth of software. In this paper, we aim to experimentally validate how much inheritance metrics are helpful to classify unlabeled data sets besides conceiving a novel mechanism to label a cluster as faulty or fault-free. We have collected ten public data sets that have inheritance and C&K metrics. Then, these base datasets are further split into two datasets labeled as C&K with inheritance and the C&K dataset for evaluation. K-means clustering is applied, Euclidean formula to compute distances and then label clusters through the average mechanism. Finally, TPR, Recall, Precision, F1 measures, and ROC are computed to measure performance which showed an adequate impact of inheritance metrics in SFP specifically classifying unlabeled datasets and correct classification of instances. The experiment also reveals that the average mechanism is suitable to label clusters in SFP. The quality assurance practitioners can benefit from the utilization of metrics associated with inheritance for labeling datasets and clusters.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords