Deep Learning With Customized Abstract Syntax Tree for Bug Localization

Hongliang Liang; Lu Sun; Meilin Wang; Yuxing Yang

doi:10.1109/access.2019.2936948

IEEE Access (Jan 2019)

Deep Learning With Customized Abstract Syntax Tree for Bug Localization

Hongliang Liang,
Lu Sun,
Meilin Wang,
Yuxing Yang

Affiliations

Hongliang Liang: ORCiD; School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China
Lu Sun: School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China
Meilin Wang: China Information Technology Security Evaluation Center, Beijing, China
Yuxing Yang: School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, China

DOI: https://doi.org/10.1109/access.2019.2936948
Journal volume & issue: Vol. 7
pp. 116309 – 116320

Abstract

Read online

Given a bug report, bug localization technique can help developers automatically locate potential buggy files. Information retrieval and deep learning approaches have been applied in bug localization by extracting lexical features in bug reports and syntactic features in source code files, though they fail to utilize the structural and semantic information of source code files. In this paper, we present a bug localization system CAST, which exploits deep learning and customized abstract syntax trees of programs to locate potential buggy source files automatically and effectively. Specifically, CAST extracts both lexical semantics in bug reports (e.g., words) and source files (e.g., method names) and program semantics in source files (e.g., abstract syntax tree, AST). Moreover, CAST enhances the tree-based convolutional neural network (TBCNN) model with customized ASTs, which distinguish between user-defined methods and system-provided ones to reflect their contributions leading to defects. Furthermore, customized ASTs group the syntactic entities with similar semantics and prune the ones with little or redundant semantics in order to facilitate the learning performance. Experimental results on four widely-used software projects show that CAST significantly outperforms the state-of-the-art methods in locating the buggy source files.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords