Journal of King Saud University: Computer and Information Sciences (Feb 2024)
MTLink: Adaptive multi-task learning based pre-trained language model for traceability link recovery between issues and commits
Abstract
Traceability links between issues and commits (issue-commit links recovery (ILR)) play a significant role in software maintenance tasks by enhancing developers’ observability in practice. Recent advancements in large language models, particularly pre-trained models, have improved the effectiveness of automated ILR. However, these models’ large parameter sizes and extended training time pose challenges in large software projects. Besides, existing methods often overlook the association and distinction among artifacts, leading to the generation of erroneous links. To mitigate these problems, this paper proposes a novel link recovery method called MTLink. It utilizes multi-teacher knowledge distillation (MTKD) to compress the model and employs an adaptive multi-task strategy to reduce information loss and improve link accuracy. Experiments are conducted on four open-source projects. The results show that (i) MTLink outperforms state-of-the-art methods; (ii) The multi-teacher knowledge distillation maintains accuracy despite model size reduction; (iii) The adaptive multi-task tracing method effectively handles confusion caused by similar artifacts and balances each task. In conclusion, MTLink offers an efficient solution for ILR in software traceability. The code is available at https://zenodo.org/records/10321150.