Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Xuanye Wang; Lu Lu; Zhanyu Yang; Qingyan Tian; Haisha Lin

doi:10.1007/s44196-024-00551-3

International Journal of Computational Intelligence Systems (Jun 2024)

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Xuanye Wang,
Lu Lu,
Zhanyu Yang,
Qingyan Tian,
Haisha Lin

Affiliations

Xuanye Wang: School of Computer Science and Engineering, South China University of Technology
Lu Lu: School of Computer Science and Engineering, South China University of Technology
Zhanyu Yang: School of Computer Science and Engineering, South China University of Technology
Qingyan Tian: Guangdong Province Key Laboratory of Tunnel Safety and Emergency Support Technology and Equipment
Haisha Lin: Guangdong Province Key Laboratory of Tunnel Safety and Emergency Support Technology and Equipment

DOI: https://doi.org/10.1007/s44196-024-00551-3
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Software Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at ( https://gitee.com/wxyzjp123/msdd-ia3/ ).

Published in International Journal of Computational Intelligence Systems

ISSN: 1875-6891 (Print); 1875-6883 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44196

About the journal

Abstract

Keywords