PeerJ Computer Science (Oct 2021)
Inheritance metrics feats in unsupervised learning to classify unlabeled datasets and clusters in fault prediction
Abstract
Fault prediction is a necessity to deliver high-quality software. The absence of training data and mechanism to labeling a cluster faulty or fault-free is a topic of concern in software fault prediction (SFP). Inheritance is an important feature of object-oriented development, and its metrics measure the complexity, depth, and breadth of software. In this paper, we aim to experimentally validate how much inheritance metrics are helpful to classify unlabeled data sets besides conceiving a novel mechanism to label a cluster as faulty or fault-free. We have collected ten public data sets that have inheritance and C&K metrics. Then, these base datasets are further split into two datasets labeled as C&K with inheritance and the C&K dataset for evaluation. K-means clustering is applied, Euclidean formula to compute distances and then label clusters through the average mechanism. Finally, TPR, Recall, Precision, F1 measures, and ROC are computed to measure performance which showed an adequate impact of inheritance metrics in SFP specifically classifying unlabeled datasets and correct classification of instances. The experiment also reveals that the average mechanism is suitable to label clusters in SFP. The quality assurance practitioners can benefit from the utilization of metrics associated with inheritance for labeling datasets and clusters.
Keywords