On the Effectiveness of Feature Selection Techniques in the Context of ML-Based Regression Test Prioritization

Md Asif Khan; Akramul Azim; Ramiro Liscano; Kevin Smith; Yee-Kang Chang; Gkerta Seferi; Qasim Tauseef

doi:10.1109/ACCESS.2024.3459656

IEEE Access (Jan 2024)

On the Effectiveness of Feature Selection Techniques in the Context of ML-Based Regression Test Prioritization

Md Asif Khan,
Akramul Azim,
Ramiro Liscano,
Kevin Smith,
Yee-Kang Chang,
Gkerta Seferi,
Qasim Tauseef

Affiliations

Md Asif Khan: ORCiD; Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON, Canada
Akramul Azim: Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON, Canada
Ramiro Liscano: Department of Electrical, Computer and Software Engineering, Ontario Tech University, Oshawa, ON, Canada
Kevin Smith: IBM United Kingdom Ltd., Portsmouth, Hampshire, U.K.
Yee-Kang Chang: IBM Canada Ltd., Markham, ON, Canada
Gkerta Seferi: IBM United Kingdom Ltd., Portsmouth, Hampshire, U.K.
Qasim Tauseef: IBM United Kingdom Ltd., Portsmouth, Hampshire, U.K.

DOI: https://doi.org/10.1109/ACCESS.2024.3459656
Journal volume & issue: Vol. 12
pp. 131556 – 131575

Abstract

Read online

Regression testing is essential for maintaining software functionality in continuous integration (CI) systems, but it can become increasingly costly as software complexity grows. Machine learning-based Regression Test Prioritization (RTP) techniques have been developed to prioritize test cases based on their likelihood of failure, aiming to detect failures early and optimize resource use. However, the features used in the current state-of-the-art for training machine learning (ML) models often vary widely across different datasets, highlighting the need for further research to identify effective feature sets for RTP. Furthermore, the feature selection techniques are frequently biased toward specific features based on the dataset. Hence, we explored an ensemble technique to utilize three ML-based feature selection techniques in this study to identify and refine key features that enhance test case prioritization. These techniques were applied across four tree-based ML models using data from 15 large-scale open-source software projects. Our analysis identified the most compelling features for predicting failures and assessed their impact on RTP. The results showed that using a refined subset of features could achieve similar or up to a 10% increase in RTP performance, using only one-third of the original feature set. We also empirically evaluated the cost considerations when choosing the three methods and reported the ML models’ performance with the refined feature sets. This underscores the potential of integrating advanced feature selection methods into RTP processes.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords