LeONet: A Hybrid Deep Learning Approach for High-Precision Code Clone Detection Using Abstract Syntax Tree Features

Thanoshan Vijayanandan; Kuhaneswaran Banujan; Ashan Induranga; Banage T. G. S. Kumara; Kaveenga Koswattage

doi:10.3390/bdcc9070187

Big Data and Cognitive Computing (Jul 2025)

LeONet: A Hybrid Deep Learning Approach for High-Precision Code Clone Detection Using Abstract Syntax Tree Features

Thanoshan Vijayanandan,
Kuhaneswaran Banujan,
Ashan Induranga,
Banage T. G. S. Kumara,
Kaveenga Koswattage

Affiliations

Thanoshan Vijayanandan: Center for Nano Device Fabrication and Characterization (CNFC), Faculty of Technology, Sabaragamuwa University of Sri Lanka, Belihuloya 70140, Sri Lanka
Kuhaneswaran Banujan: Faculty of Science and Engineering, Southern Cross University, Lismore, NSW 2480, Australia
Ashan Induranga: Center for Nano Device Fabrication and Characterization (CNFC), Faculty of Technology, Sabaragamuwa University of Sri Lanka, Belihuloya 70140, Sri Lanka
Banage T. G. S. Kumara: Center for Nano Device Fabrication and Characterization (CNFC), Faculty of Technology, Sabaragamuwa University of Sri Lanka, Belihuloya 70140, Sri Lanka
Kaveenga Koswattage: Center for Nano Device Fabrication and Characterization (CNFC), Faculty of Technology, Sabaragamuwa University of Sri Lanka, Belihuloya 70140, Sri Lanka

DOI: https://doi.org/10.3390/bdcc9070187
Journal volume & issue: Vol. 9, no. 7
p. 187

Abstract

Read online

Code duplication, commonly referred to as code cloning, is not inherent in software systems but arises due to various factors, such as time constraints in meeting project deadlines. These duplications, or “code clones”, complicate the program structure and increase maintenance costs. Code clones are categorized into four types: Type-1, Type-2, Type-3, and Type-4. This study aims to address the adverse effects of code clones by introducing LeONet, a hybrid Deep Learning approach that enhances the detection of code clones in software systems. The hybrid approach, LeONet, combines LeNet-5 with Oreo’s Siamese architecture. We extracted clone method pairs from the BigCloneBench Java repository. Feature extraction was performed using Abstract Syntax Trees, which are scalable and accurately represent the syntactic structure of the source code. The performance of LeONet was compared against other classifiers including ANN, LeNet-5, Oreo’s Siamese, LightGBM, XGBoost, and Decision Tree. LeONet demonstrated superior performance among the classifiers tested, achieving the highest F1 score of 98.12%. It also compared favorably against state-of-the-art approaches, indicating its effectiveness in code clone detection. The results validate the effectiveness of LeONet in detecting code clones, outperforming existing classifiers and competing closely with advanced methods. This study underscores the potential of hybrid deep learning models and feature extraction techniques in improving the accuracy of code clone detection, providing a promising direction for future research in this area.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords