A Suitable AST Node Granularity and Multi-Kernel Transfer Convolutional Neural Network for Cross-Project Defect Prediction

Jiehan Deng; Lu Lu; Shaojian Qiu; Yangpeng Ou

doi:10.1109/access.2020.2985780

IEEE Access (Jan 2020)

A Suitable AST Node Granularity and Multi-Kernel Transfer Convolutional Neural Network for Cross-Project Defect Prediction

Jiehan Deng,
Lu Lu,
Shaojian Qiu,
Yangpeng Ou

Affiliations

Jiehan Deng: ORCiD; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Lu Lu: ORCiD; Modern Industrial Technology Research Institute, South China University of Technology, Zhongshan, China
Shaojian Qiu: ORCiD; College of Mathematics and Informatics, South China Agricultural University, Guangzhou, China
Yangpeng Ou: ORCiD; School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

DOI: https://doi.org/10.1109/access.2020.2985780
Journal volume & issue: Vol. 8
pp. 66647 – 66661

Abstract

Read online

Cross-project defect prediction (CPDP) is a feasible way to perform software defect prediction (SDP) when lacking historical data. Recent CPDP approaches have employed deep learning techniques to better exploit the information from the program's abstract syntax trees (ASTs). However, the granularity of the AST nodes and the data distribution difference between projects may have negative impacts on the prediction performance, which many CPDP studies didn't take into consideration. To handle these issues, this paper explores a better AST node granularity and proposes a CPDP framework based on multi-kernel transfer convolutional neural networks. Specifically, for AST node granularity, we explore the difference of three AST node granularities and then compare the prediction performance of each granularity on several prediction models. For the CPDP framework, we first parse the program source code into ASTs and then encode the AST nodes into numerical vectors using the embedding technique. Secondly, to mine transferable semantic features, the encoded ASTs are fed into a convolutional neural network, in which a multi-kernel matching layer is added to minimize the data distribution divergence between the source and target project. Finally, to make use of the information from the handcrafted features, the semantic features mined from the AST are joint with handcrafted features to form the joint features for CPDP. We evaluate our approach on 110 CPDP tasks formed by 11 open-source projects and results show that the proposed CPDP method outperforms most deep learning-based approaches.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords