READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization

Yunseok Choi; Cheolwon Na; Hyojun Kim; Jee-Hyong Lee

doi:10.1109/ACCESS.2023.3271992

IEEE Access (Jan 2023)

READSUM: Retrieval-Augmented Adaptive Transformer for Source Code Summarization

Yunseok Choi,
Cheolwon Na,
Hyojun Kim,
Jee-Hyong Lee

Affiliations

Yunseok Choi: ORCiD; Department of Platform Software, Sungkyunkwan University, Suwon, South Korea
Cheolwon Na: ORCiD; Department of Artificial Intelligence, Sungkyunkwan University, Suwon, South Korea
Hyojun Kim: ORCiD; Department of Artificial Intelligence, Sungkyunkwan University, Suwon, South Korea
Jee-Hyong Lee: ORCiD; Department of Artificial Intelligence, Sungkyunkwan University, Suwon, South Korea

DOI: https://doi.org/10.1109/ACCESS.2023.3271992
Journal volume & issue: Vol. 11
pp. 51155 – 51165

Abstract

Read online

Code summarization is the process of automatically generating brief and informative summaries of source code to aid in software comprehension and maintenance. In this paper, we propose a novel model called READSUM, REtrieval-augmented ADaptive transformer for source code SUMmarization, that combines both abstractive and extractive approaches. Our proposed model generates code summaries in an abstractive manner, taking into account both the structural and sequential information of the input code, while also utilizing an extractive approach that leverages a retrieved summary of similar code to increase the frequency of important keywords. To effectively blend the original code and the retrieved similar code at the embedding layer stage, we obtain the augmented representation of the original code and the retrieved code through multi-head self-attention. In addition, we develop a self-attention network that adaptively learns the structural and sequential information for the representations in the encoder stage. Furthermore, we design a fusion network to capture the relation between the original code and the retrieved summary at the decoder stage. The fusion network effectively guides summary generation based on the retrieved summary. Finally, READSUM extracts important keywords using an extractive approach and generates high-quality summaries using an abstractive approach that considers both the structural and sequential information of the source code. We demonstrate the superiority of READSUM through various experiments and an ablation study. Additionally, we perform a human evaluation to assess the quality of the generated summary.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords