IEEE Access (Jan 2019)

A Systematic Review on Code Clone Detection

  • Qurat Ul Ain,
  • Wasi Haider Butt,
  • Muhammad Waseem Anwar,
  • Farooque Azam,
  • Bilal Maqbool

DOI
https://doi.org/10.1109/ACCESS.2019.2918202
Journal volume & issue
Vol. 7
pp. 86121 – 86144

Abstract

Read online

Code cloning refers to the duplication of source code. It is the most common way of reusing source code in software development. If a bug is identified in one segment of code, all the similar segments need to be checked for the same bug. Consequently, this cloning process may lead to bug propagation that significantly affects the maintenance cost. By considering this problem, code clone detection (CCD) appears as an active area of research. Consequently, there is a strong need to investigate the latest techniques, trends, and tools in the domain of CCD. Therefore, in this paper, we comprehensively inspect the latest tools and techniques utilized for the detection of code clones. Particularly, a systematic literature review (SLR) is performed to select and investigate 54 studies pertaining to CCD. Consequently, six categories are defined to incorporate the selected studies as per relevance, i.e., textual approaches (12), lexical approaches (8), tree-based approaches (3), metric-based approaches (7), semantic approaches (7), and hybrid approaches (17). We identified and analyzed 26 CCD tools, i.e., 13 existing and 13 proposed/developed. Moreover, 62 open-source subject systems whose source code is utilized for the CCD are presented. It is concluded that there exist several studies to detect type1, type2, type3, and type4 clones individually. However, there is a need to develop novel approaches with complete tool support in order to detect all four types of clones collectively. Furthermore, it is also required to introduce more approaches to simplify the development of a program dependency graph (PDG) while dealing with the detection of the type4 clones.

Keywords