IEEE Access (Jan 2024)
GNEM: Comprehensive Similarity Learning With Ensemble Model for Code Search
Abstract
Code search is a relevant research field of software engineering, with the objective of accurately retrieving the most relevant code for a given query. However, recent deep-learning-based code search models are limited in scalability and comprehensiveness for alignment learning since these models suffer from the out-of-vocabulary problem, and affinity matrix-based cross-modal attention may lead to incorrect alignments. In this paper, we propose a novel code search model, namely the Graph Network Ensemble Model (GNEM), to address the challenges by diverse learning alignments and enhancing the similarity representation. GNEM incorporates two graph networks to learn global and fine-grained alignments for inferring comprehensive similarity. To evaluate the performance of GNEM, we compared it with baseline models using two widely used datasets. The results demonstrate that GNEM achieves a Top@1 accuracy of 0.649 and 0.702, surpassing baseline models by approximately 18.6% and 11.7% in Top@1 accuracy, respectively. We also conducted ablation experiments to show the effectiveness of each component of GNEM. Finally, we visualize the attention weights between code and query to illustrate GNEM’s behaviors while code searching. The results provide insights into GNEM’s effective code search capabilities.
Keywords