IEEE Access (Jan 2019)

Deep Learning With Customized Abstract Syntax Tree for Bug Localization

  • Hongliang Liang,
  • Lu Sun,
  • Meilin Wang,
  • Yuxing Yang

DOI
https://doi.org/10.1109/access.2019.2936948
Journal volume & issue
Vol. 7
pp. 116309 – 116320

Abstract

Read online

Given a bug report, bug localization technique can help developers automatically locate potential buggy files. Information retrieval and deep learning approaches have been applied in bug localization by extracting lexical features in bug reports and syntactic features in source code files, though they fail to utilize the structural and semantic information of source code files. In this paper, we present a bug localization system CAST, which exploits deep learning and customized abstract syntax trees of programs to locate potential buggy source files automatically and effectively. Specifically, CAST extracts both lexical semantics in bug reports (e.g., words) and source files (e.g., method names) and program semantics in source files (e.g., abstract syntax tree, AST). Moreover, CAST enhances the tree-based convolutional neural network (TBCNN) model with customized ASTs, which distinguish between user-defined methods and system-provided ones to reflect their contributions leading to defects. Furthermore, customized ASTs group the syntactic entities with similar semantics and prune the ones with little or redundant semantics in order to facilitate the learning performance. Experimental results on four widely-used software projects show that CAST significantly outperforms the state-of-the-art methods in locating the buggy source files.

Keywords