Big Data and Cognitive Computing (Jul 2025)
LeONet: A Hybrid Deep Learning Approach for High-Precision Code Clone Detection Using Abstract Syntax Tree Features
Abstract
Code duplication, commonly referred to as code cloning, is not inherent in software systems but arises due to various factors, such as time constraints in meeting project deadlines. These duplications, or “code clones”, complicate the program structure and increase maintenance costs. Code clones are categorized into four types: Type-1, Type-2, Type-3, and Type-4. This study aims to address the adverse effects of code clones by introducing LeONet, a hybrid Deep Learning approach that enhances the detection of code clones in software systems. The hybrid approach, LeONet, combines LeNet-5 with Oreo’s Siamese architecture. We extracted clone method pairs from the BigCloneBench Java repository. Feature extraction was performed using Abstract Syntax Trees, which are scalable and accurately represent the syntactic structure of the source code. The performance of LeONet was compared against other classifiers including ANN, LeNet-5, Oreo’s Siamese, LightGBM, XGBoost, and Decision Tree. LeONet demonstrated superior performance among the classifiers tested, achieving the highest F1 score of 98.12%. It also compared favorably against state-of-the-art approaches, indicating its effectiveness in code clone detection. The results validate the effectiveness of LeONet in detecting code clones, outperforming existing classifiers and competing closely with advanced methods. This study underscores the potential of hybrid deep learning models and feature extraction techniques in improving the accuracy of code clone detection, providing a promising direction for future research in this area.
Keywords