IEEE Access (Jan 2018)

HDA: Cross-Project Defect Prediction via Heterogeneous Domain Adaptation With Dictionary Learning

  • Zhou Xu,
  • Peipei Yuan,
  • Tao Zhang,
  • Yutian Tang,
  • Shuai Li,
  • Zhen Xia

DOI
https://doi.org/10.1109/ACCESS.2018.2873755
Journal volume & issue
Vol. 6
pp. 57597 – 57613

Abstract

Read online

Cross-Project Defect Prediction (CPDP) is an active topic for predicting defects on projects (target projects) with scarce-labeled data by reusing the classification models from other projects (source projects). Traditional CPDP methods require common features between the data of two projects and utilize them to construct defect prediction models. However, when cross-project data do not satisfy the requirement, i.e., heterogeneous CPDP (HCPDP) scenario, these methods become infeasible. In this paper, we propose a novel HCPDP method called Heterogeneous Domain Adaptation (HDA) to address the issue. HDA treats the cross-project data as being from two different domains with heterogeneous feature sets. It employs the domain adaptation method to embed the data from the two domains into a comparable feature space with a lower dimension, then measures the difference between the two mapped domains of data using the dictionaries learned from them with the dictionary learning technique. We comprehensively evaluate HDA on 94 cross-project pairs of 12 projects from three open-source defect data sets with three performance indicators, i.e., F-measure, Balance, and AUC. Compared with the two state-of-the-art HCPDP methods, the experimental results indicate that HDA improves 0.219 and 0.336 in terms of F-measure, 0.185 and 0.215 in terms of Balance, and 0.131 and 0.035 in terms of AUC. In addition, HDA achieves comparable results compared with Within-Project Defect Prediction (WPDP) setting and a state-of-the-art unsupervised learning method in most cases.

Keywords