Pseudocode Generation from Source Code Using the BART Model

Anas Alokla; Walaa Gad; Waleed Nazih; Mustafa Aref; Abdel-badeeh Salem

doi:10.3390/math10213967

Mathematics (Oct 2022)

Pseudocode Generation from Source Code Using the BART Model

Anas Alokla,
Walaa Gad,
Waleed Nazih,
Mustafa Aref,
Abdel-badeeh Salem

Affiliations

Anas Alokla: Faculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, Egypt
Walaa Gad: Faculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, Egypt
Waleed Nazih: College of Computer Engineering and Sciences, Prince Sattam Bin Abdulaziz University, Al Kharj 11942, Saudi Arabia
Mustafa Aref: Faculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, Egypt
Abdel-badeeh Salem: Faculty of Computers and Information Sciences, Ain Shams University, Abassia, Cairo 11566, Egypt

DOI: https://doi.org/10.3390/math10213967
Journal volume & issue: Vol. 10, no. 21
p. 3967

Abstract

Read online

In the software development process, more than one developer may work on developing the same program and bugs in the program may be fixed by a different developer; therefore, understanding the source code is an important issue. Pseudocode plays an important role in solving this problem, as it helps the developer to understand the source code. Recently, transformer-based pre-trained models achieved remarkable results in machine translation, which is similar to pseudocode generation. In this paper, we propose a novel automatic pseudocode generation from the source code based on a pre-trained Bidirectional and Auto-Regressive Transformer (BART) model. We fine-tuned two pre-trained BART models (i.e., large and base) using a dataset containing source code and its equivalent pseudocode. In addition, two benchmark datasets (i.e., Django and SPoC) were used to evaluate the proposed model. The proposed model based on the BART large model outperforms other state-of-the-art models in terms of BLEU measurement by 15% and 27% for Django and SPoC datasets, respectively.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords