International Journal of Computational Intelligence Systems (Jun 2024)

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

  • Xuanye Wang,
  • Lu Lu,
  • Zhanyu Yang,
  • Qingyan Tian,
  • Haisha Lin

DOI
https://doi.org/10.1007/s44196-024-00551-3
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at ( https://gitee.com/wxyzjp123/msdd-ia3/ ).

Keywords