IEEE Access (Jan 2021)

An Effective Semantic Code Clone Detection Framework Using Pairwise Feature Fusion

  • Abdullah Sheneamer,
  • Swarup Roy,
  • Jugal Kalita

DOI
https://doi.org/10.1109/ACCESS.2021.3079156
Journal volume & issue
Vol. 9
pp. 84828 – 84844

Abstract

Read online

Code clones. In this work, we propose a novel detection framework using machine learning for automated detection of all four type of clones. The features extracted from a pair of code blocks are combined for possible detection of a clone with respect to a reference block. We use AST and PDG features of both code blocks to prepare labelled training samples after fusing the two feature vectors using three different alternatives. We use six state-of-the-art classification models including Deep Convolutional Neural Network to assess the prediction performance of our scheme. To access the effectiveness of our framework we use seven datasets and compare its performance with five state-of-the-art clone detectors. We also compare a large number of algorithms for code clone detection. Comparing the performance of a large number of machine learning techniques, ANN and non-ANN, using such features, and establishing that fusing of AST and PDG features gives competitive results using deep learning as well as boosted tree algorithms, we find that boosted tree algorithms like XGBoost are quite competitive in clone detection. Experimental results demonstrate that our approach outperforms existing clone detection methods in terms of prediction accuracy.

Keywords