IET Software (Dec 2022)

An unsupervised cross project model for crashing fault residence identification

  • Xiao Liu,
  • Zhou Xu,
  • Dan Yang,
  • Meng Yan,
  • Weihan Zhang,
  • Haohan Zhao,
  • Lei Xue,
  • Ming Fan

DOI
https://doi.org/10.1049/sfw2.12073
Journal volume & issue
Vol. 16, no. 6
pp. 630 – 646

Abstract

Read online

Abstract It is a critical quality assurance activity to effectively detect the root cause of faults causing the software crashes (i.e. crashing faults). Previous studies extracted features to characterise crash instances and built models to identify whether the residences of crashing faults locate inside the stack traces. These models all belong to supervised learning methods which require labelled crash data to be involved. In this study, the introduction of an unsupervised model, called Transfer Spectral Clustering (TSC), for the task of crashing fault residence identification under the unlabelled data scenario is proposed. Unlike traditional unsupervised methods which are applied to individual project data, TSC transfers the knowledge of auxiliary unlabelled data from the source project to assist the clustering task on the unlabelled data from the target project. TSC is an unsupervised transfer learning method, and simultaneously considers the data manifold information of the individual project and feature manifold information across projects to facilitate the clustering effect. Extensive experiments are conducted on a benchmark dataset containing seven software projects. Five indicators were chosen for performance evaluation. The results show that TSC achieves better performance than four clustering based unsupervised methods, and competitive performance compared with eight supervised cross‐project methods.