IEEE Access (Jan 2021)

Large Scale Evaluation of Natural Language Processing Based Test-to-Code Traceability Approaches

  • Andras Kicsi,
  • Viktor Csuvik,
  • Laszlo Vidacs

DOI
https://doi.org/10.1109/ACCESS.2021.3083923
Journal volume & issue
Vol. 9
pp. 79089 – 79104

Abstract

Read online

Traceability information can be crucial for software maintenance, testing, automatic program repair, and various other software engineering tasks. Customarily, a vast amount of test code is created for systems to maintain and improve software quality. Today’s test systems may contain tens of thousands of tests. Finding the parts of code tested by each test case is usually a difficult and time-consuming task without the help of the authors of the tests or at least clear naming conventions. Recent test-to-code traceability research has employed various approaches but textual methods as standalone techniques were investigated only marginally. The naming convention approach is a well-regarded method among developers. Besides their often only voluntary use, however, one of its main weaknesses is that it can only identify one-to-one links. With the use of more versatile text-based methods, candidates could be ranked by similarity, thus producing a number of possible connections. Textual methods also have their disadvantages, even machine learning techniques can only provide semantically connected links from the text itself, these can be refined with the incorporation of structural information. In this paper, we investigate the applicability of three text-based methods both as a standalone traceability link recovery technique and regarding their combination possibilities with each other and with naming conventions. The paper presents an extensive evaluation of these techniques using several source code representations and meta-parameter settings on eight real, medium-sized software systems with a combined size of over 1.25 million lines of code. Our results suggest that with suitable settings, text-based approaches can be used for test-to-code traceability purposes, even where naming conventions were not followed.

Keywords