Code summarization based on large model knowledge distillation

YOU  Gang, LIU  Wenjie, LI  Meipeng, SUN  Liqun, WANG  Lian, TIAN  Tieku

doi:10.3969/j.issn.1673-3819.2025.04.005

Zhihui kongzhi yu fangzhen (Aug 2025)

Code summarization based on large model knowledge distillation

YOU Gang, LIU Wenjie, LI Meipeng, SUN Liqun, WANG Lian, TIAN Tieku

Affiliations

YOU Gang, LIU Wenjie, LI Meipeng, SUN Liqun, WANG Lian, TIAN Tieku: Unit 96941 of PLA, Beijing 100085, China

DOI: https://doi.org/10.3969/j.issn.1673-3819.2025.04.005
Journal volume & issue: Vol. 47, no. 4
pp. 27 – 33

Abstract

Read online

Code summarization is a short natural language description of source code. Summaries are usually only one sentence long, but they are the primary way for developers to understand code. Recently, products based on large language models (such as ChatGPT) have demonstrated a strong ability to generate these descriptions. However, to use these tools, programmers must send their code to an untrusted third party for processing (for example, through API calls), but this method is unacceptable to many organizations. This paper presents an alternative: we use the example output generated by GPT-3.5 to train an open source model through a process related to knowledge distillation. Enabling small models (with 350 million parameters) to also be comparable to GPT-3.5 in code summarization tasks.

code summarization|large model|knowledge distillation

Published in Zhihui kongzhi yu fangzhen

ISSN: 1673-3819 (Print)
Publisher: Editorial Office of Command Control and Simulation
Country of publisher: China
LCC subjects: Military Science
Website: https://www.zhkzyfz.cn/EN/home

About the journal

Abstract

Keywords