IEEE Access (Jan 2020)
A Suitable AST Node Granularity and Multi-Kernel Transfer Convolutional Neural Network for Cross-Project Defect Prediction
Abstract
Cross-project defect prediction (CPDP) is a feasible way to perform software defect prediction (SDP) when lacking historical data. Recent CPDP approaches have employed deep learning techniques to better exploit the information from the program's abstract syntax trees (ASTs). However, the granularity of the AST nodes and the data distribution difference between projects may have negative impacts on the prediction performance, which many CPDP studies didn't take into consideration. To handle these issues, this paper explores a better AST node granularity and proposes a CPDP framework based on multi-kernel transfer convolutional neural networks. Specifically, for AST node granularity, we explore the difference of three AST node granularities and then compare the prediction performance of each granularity on several prediction models. For the CPDP framework, we first parse the program source code into ASTs and then encode the AST nodes into numerical vectors using the embedding technique. Secondly, to mine transferable semantic features, the encoded ASTs are fed into a convolutional neural network, in which a multi-kernel matching layer is added to minimize the data distribution divergence between the source and target project. Finally, to make use of the information from the handcrafted features, the semantic features mined from the AST are joint with handcrafted features to form the joint features for CPDP. We evaluate our approach on 110 CPDP tasks formed by 11 open-source projects and results show that the proposed CPDP method outperforms most deep learning-based approaches.
Keywords