Large Scale Evaluation of Natural Language Processing Based Test-to-Code Traceability Approaches

Andras Kicsi; Viktor Csuvik; Laszlo Vidacs

doi:10.1109/ACCESS.2021.3083923

IEEE Access (Jan 2021)

Large Scale Evaluation of Natural Language Processing Based Test-to-Code Traceability Approaches

Andras Kicsi,
Viktor Csuvik,
Laszlo Vidacs

Affiliations

Andras Kicsi: ORCiD; MTA-SZTE Research Group on Artificial Intelligence, University of Szeged, Szeged, Hungary
Viktor Csuvik: ORCiD; MTA-SZTE Research Group on Artificial Intelligence, University of Szeged, Szeged, Hungary
Laszlo Vidacs: ORCiD; MTA-SZTE Research Group on Artificial Intelligence, University of Szeged, Szeged, Hungary

DOI: https://doi.org/10.1109/ACCESS.2021.3083923
Journal volume & issue: Vol. 9
pp. 79089 – 79104

Abstract

Read online

Traceability information can be crucial for software maintenance, testing, automatic program repair, and various other software engineering tasks. Customarily, a vast amount of test code is created for systems to maintain and improve software quality. Today’s test systems may contain tens of thousands of tests. Finding the parts of code tested by each test case is usually a difficult and time-consuming task without the help of the authors of the tests or at least clear naming conventions. Recent test-to-code traceability research has employed various approaches but textual methods as standalone techniques were investigated only marginally. The naming convention approach is a well-regarded method among developers. Besides their often only voluntary use, however, one of its main weaknesses is that it can only identify one-to-one links. With the use of more versatile text-based methods, candidates could be ranked by similarity, thus producing a number of possible connections. Textual methods also have their disadvantages, even machine learning techniques can only provide semantically connected links from the text itself, these can be refined with the incorporation of structural information. In this paper, we investigate the applicability of three text-based methods both as a standalone traceability link recovery technique and regarding their combination possibilities with each other and with naming conventions. The paper presents an extensive evaluation of these techniques using several source code representations and meta-parameter settings on eight real, medium-sized software systems with a combined size of over 1.25 million lines of code. Our results suggest that with suitable settings, text-based approaches can be used for test-to-code traceability purposes, even where naming conventions were not followed.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords