A Large-Scale Empirical Investigation Into Cross-Project Flaky Test Prediction

Angelo Afeltra; Alfonso Cannavale; Fabiano Pecorelli; Valeria Pontillo; Fabio Palomba

doi:10.1109/ACCESS.2024.3458184

IEEE Access (Jan 2024)

A Large-Scale Empirical Investigation Into Cross-Project Flaky Test Prediction

Angelo Afeltra,
Alfonso Cannavale,
Fabiano Pecorelli,
Valeria Pontillo,
Fabio Palomba

Affiliations

Angelo Afeltra: Software Engineering (SeSa) Laboratory, University of Salerno, Fisciano, Italy
Alfonso Cannavale: ORCiD; Software Engineering (SeSa) Laboratory, University of Salerno, Fisciano, Italy
Fabiano Pecorelli: ORCiD; Department of Information Science and Technology, Pegaso Digital University, Naples, Italy
Valeria Pontillo: ORCiD; Software Languages (Soft) Laboratory, Vrije Universiteit Brussel, Brussels, Belgium
Fabio Palomba: ORCiD; Software Engineering (SeSa) Laboratory, University of Salerno, Fisciano, Italy

DOI: https://doi.org/10.1109/ACCESS.2024.3458184
Journal volume & issue: Vol. 12
pp. 131255 – 131265

Abstract

Read online

Test flakiness arises when a test case exhibits inconsistent behavior by alternating between passing and failing states when executed against the same code. Previous research showed the significance of the problem in practice, proposing empirical studies into the nature of flakiness and automated techniques for its detection. Machine learning models emerged as a promising approach for flaky test prediction. However, existing research has predominantly focused on within-project scenarios, where models are trained and tested using data from a single project. On the contrary, little is known about how flaky test prediction models may be adapted to software projects lacking sufficient historical data for effective prediction. In this paper, we address this gap by proposing a large-scale assessment of flaky test prediction in cross-project scenarios, i.e., in situations where predictive models are trained using data coming from external projects. Leveraging a dataset of 1,385 flaky tests from 29 open-source projects, we examine static test flakiness prediction models and evaluate feature- and instance-based filtering methods for cross-project predictions. Our study underscores the difficulties in utilizing cross-project flaky test data and underscores the significance of filtering methods in enhancing prediction accuracy. Notably, we find that the TrAdaBoost filtering method significantly reduces data heterogeneity, leading to an F-Measure of 70%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords