A Generative AI-Driven Method-Level Semantic Clone Detection Based on the Structural and Semantical Comparison of Methods

Aditi Gupta; Rinkaj Goyal

doi:10.1109/ACCESS.2024.3401770

IEEE Access (Jan 2024)

A Generative AI-Driven Method-Level Semantic Clone Detection Based on the Structural and Semantical Comparison of Methods

Aditi Gupta,
Rinkaj Goyal

Affiliations

Aditi Gupta: University School of Information, Communication and Technology, Guru Gobind Singh (GGS) Indraprastha University, New Delhi, India
Rinkaj Goyal: ORCiD; University School of Information, Communication and Technology, Guru Gobind Singh (GGS) Indraprastha University, New Delhi, India

DOI: https://doi.org/10.1109/ACCESS.2024.3401770
Journal volume & issue: Vol. 12
pp. 70773 – 70791

Abstract

Read online

Code fragments with identical or similar functionality are called code clones. This study aims to detect semantic clones in Java-based programs, focusing on the method-level granularity. To accomplish this, our approach extracts and compares both semantical and structural features of Java methods and applies certain heuristics to obtain the final result set. The semantics of a method are often described by its documentation, while its structural details are characterized by its implementation within the code body. However, Java methods may not always be accompanied by descriptive documentation. To address this, we have employed a Generative AI tool named ChatGPT, which has the ability to understand the given code and generate its documentation. To compare the documentation of methods, we have used various corpus-based and knowledge-based information retrieval (IR) techniques. However, for comparing a method’s structural details, we tokenized the method’s body and applied an information retrieval technique called VSM (Vector Space Modelling) on the tokenized body. Additionally, we also extracted and compared certain method-level metrics for this purpose. Further, while doing the semantical comparison of methods, eight IR variants are formed based on the internal processing requirements of different IR techniques. The technique proposed in this paper relies on the textual and metrics-based analysis of Java programs with few parsing requirements, making it lightweight and less computation-intensive. To examine the efficiency of the proposed technique, we have validated it using a semantic clone benchmark. The results show that the proposed technique detects semantic clones with high recall values ranging from 67% to 81% and precision values ranging from 60% to 96% for different IR variants explored in our research.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords