IEEE Access (Jan 2024)

A Novel Method for Code Clone Detection Based on Minimally Random Kernel Convolutional Transform

  • Mostefai Abdelkader

DOI
https://doi.org/10.1109/ACCESS.2024.3484995
Journal volume & issue
Vol. 12
pp. 158579 – 158596

Abstract

Read online

Effectively detecting code clones is essential for maintaining and evolving software systems. This paper introduces a novel approach that leverages minimally random kernel convolutional transform (MiniRocket) to represent source code for clone detection. Our objective is to explore the applicability of this cross-domain representation within software engineering. Specifically, the study examines the combination of kernels designed according to MiniRocket rules, the Proportion of Positive Values pooling operator (PPV), and advanced classifiers such as XGBoost, Random Forest in the context of clone detection. The proposed method comprises three key steps: first, code pairs are transformed into time series data; second, these time series are represented as feature vectors using MiniRocket, with each vector labeled according to its clone status (clone or non-clone); and third, a classifier is trained on this labeled dataset. This trained classifier is then utilized to determine whether two new code fragments are clones. The proposed approach was evaluated on the well-known BigCloneBench dataset and compared to seven state-of-the-art tools: SourcererCC, NIL, Code2Vec, ASTNN, FA-AST, RtvNN, and TAILOR, for detecting Type I, Type II, Type III, and Type IV clones. The empirical results indicate that CCDMR, especially the CCDMR with the XGBoost variant, outperformed these methods in detecting MT3 clones and matched the performance of the leading tool, TAILOR, in detecting ST3 clones. Overall, the average F1 score across all clone types demonstrated that CCDMR was comparable to TAILOR, with particularly strong performance in ST3 and MT3 clones detection.

Keywords