IEEE Access (Jan 2024)

DP-CCL: A Supervised Contrastive Learning Approach Using CodeBERT Model in Software Defect Prediction

  • Sadia Sahar,
  • Muhammad Younas,
  • Muhammad Murad Khan,
  • Muhammad Umer Sarwar

DOI
https://doi.org/10.1109/ACCESS.2024.3362896
Journal volume & issue
Vol. 12
pp. 22582 – 22594

Abstract

Read online

Software Defect Prediction (SDP) reduces the overall cost of software development by identifying the code at a higher risk of defects at the initial phase of software development. SDP helps the test engineers to optimize the allocation of testing resources more effectively. Traditional SDP models are built using handcrafted software metrics that ignore the structural, semantic, and contextual information of the code. Consequently, many researchers have employed deep learning models to capture contextual, semantic, and structural information from the code. In this article, we propose the DP-CCL (Defect Prediction using CodeBERT with Contrastive Learning) model to predict the defective code. The proposed model employs supervised contrastive learning using this CodeBERT Language model to capture semantic features from the source code. Contrastive learning extracts valuable information from the data by maximizing the similarity between similar data pairs (positive pair) and meanwhile minimizing the similarity between dissimilar data pairs (negative pair). Moreover, The model combines the semantic features with software metrics to obtain the benefits of both semantic and handcrafted features. The combined features are input to the logistic regression model for code classification as either buggy or clean. In this study, ten PROMISE projects were utilized to conduct the experiments. Results show that the DP-CCL model achieved significant improvement i.e., 4.9% to 14.9% increase in F-Score as compared to existing approaches.

Keywords