Journal of King Saud University: Computer and Information Sciences (Jul 2024)
An empirical study on the state-of-the-art methods for requirement-to-code traceability link recovery
Abstract
Requirements-to-code traceability link recovery (RC-TLR) can establish connections between requirements and target code artifacts, which is critical for the maintenance and evolution of large software systems. However, to the best of our knowledge, there is no existing experimental study focused on state-of-the-art (SOTA) methods for the RC-TLR problem, and there is also a lack of uniform benchmarks for evaluating new methods in the field. We developed a framework to identify SOTA methods using the Systematic Literature Review method and applied it to research in the RC-TLR field from 2018 to 2023. Through experiments replication on 13 datasets using 6 methods, we observed that for information retrieval-based methods, Close Relations between Target artifacts-based method (CRT), TraceAbility Recovery by Consensual biTerms (TAROT), and Fine-grained TLR (FTLR) performed well on COEST dataset, while Combining Part-Of-Speech with information-retrieval techniques (Conpos) and TAROT achieve promising results in large datasets. As concerns machine learning-based methods, Random Forest consistently exhibits strong performances on all datasets. We hope that this study can provide a comparative benchmark for performance evaluation in the RC-TLR field. The resource repository that we have established is expected to alleviate the workload of researchers in performance analysis, and promote progress of the field.