MTLink: Adaptive multi-task learning based pre-trained language model for traceability link recovery between issues and commits

Yang Deng; Bangchao Wang; Qiang Zhu; Junping Liu; Jiewen Kuang; Xingfu Li

Journal of King Saud University: Computer and Information Sciences (Feb 2024)

MTLink: Adaptive multi-task learning based pre-trained language model for traceability link recovery between issues and commits

Yang Deng,
Bangchao Wang,
Qiang Zhu,
Junping Liu,
Jiewen Kuang,
Xingfu Li

Affiliations

Yang Deng: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
Bangchao Wang: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China; Engineering Research Center of Hubei Province for Clothing Information, Wuhan, China; Corresponding author at: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China.
Qiang Zhu: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China; Engineering Research Center of Hubei Province for Clothing Information, Wuhan, China
Junping Liu: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China; Engineering Research Center of Hubei Province for Clothing Information, Wuhan, China
Jiewen Kuang: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China
Xingfu Li: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan, China

Journal volume & issue: Vol. 36, no. 2
p. 101958

Abstract

Read online

Traceability links between issues and commits (issue-commit links recovery (ILR)) play a significant role in software maintenance tasks by enhancing developers’ observability in practice. Recent advancements in large language models, particularly pre-trained models, have improved the effectiveness of automated ILR. However, these models’ large parameter sizes and extended training time pose challenges in large software projects. Besides, existing methods often overlook the association and distinction among artifacts, leading to the generation of erroneous links. To mitigate these problems, this paper proposes a novel link recovery method called MTLink. It utilizes multi-teacher knowledge distillation (MTKD) to compress the model and employs an adaptive multi-task strategy to reduce information loss and improve link accuracy. Experiments are conducted on four open-source projects. The results show that (i) MTLink outperforms state-of-the-art methods; (ii) The multi-teacher knowledge distillation maintains accuracy despite model size reduction; (iii) The adaptive multi-task tracing method effectively handles confusion caused by similar artifacts and balances each task. In conclusion, MTLink offers an efficient solution for ILR in software traceability. The code is available at https://zenodo.org/records/10321150.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-computer-and-information-sciences/

About the journal

Abstract

Keywords