Applied Sciences (Jul 2021)

Two-Pass Technique for Clone Detection and Type Classification Using Tree-Based Convolution Neural Network

  • Young-Bin Jo,
  • Jihyun Lee,
  • Cheol-Jung Yoo

DOI
https://doi.org/10.3390/app11146613
Journal volume & issue
Vol. 11, no. 14
p. 6613

Abstract

Read online

Appropriate reliance on code clones significantly reduces development costs and hastens the development process. Reckless cloning, in contrast, reduces code quality and ultimately adds costs and time. To avoid this scenario, many researchers have proposed methods for clone detection and refactoring. The developed techniques, however, are only reliably capable of detecting clones that are either entirely identical or that only use modified identifiers, and do not provide clone-type information. This paper proposes a two-pass clone classification technique that uses a tree-based convolution neural network (TBCNN) to detect multiple clone types, including clones that are not wholly identical or to which only small changes have been made, and automatically classify them by type. Our method was validated with BigCloneBench, a well-known and wildly used dataset of cloned code. Our experimental results validate that our technique detected clones with an average rate of 96% recall and precision, and classified clones with an average rate of 78% recall and precision.

Keywords