IEEE Access (Jan 2024)
Software Fault Prediction Using Cross-Project Analysis: A Study on Class Imbalance and Model Generalization
Abstract
Software fault prediction is a critical aspect of software engineering aimed at improving software quality and reliability. However, it faces significant challenges, including the class imbalance issue in fault data and the need for robust predictive models that generalize well across different projects. In this research, we delve into these challenges and investigate the impact of class imbalance and model generalization on software fault prediction using cross-project analysis. Our study addresses three primary research questions: Firstly, we examine the critical issue of class imbalance in fault prediction, which poses a significant hurdle to accurate model performance. Through extensive experimentation with various classifiers on diverse datasets from different software projects, we highlight the variations in classifier performance and the necessity of addressing class imbalance for reliable predictions. Secondly, we evaluate the reliability of cross-project prediction, aiming to understand how effectively predictive models trained on one project can generalize to predict faults in other projects. We demonstrate the importance of training with datasets sharing similar characteristics with the target project for achieving reliable cross-project prediction. Thirdly, we analyze the impact of increasing training samples from different projects on prediction accuracy, emphasizing the benefits of utilizing cross-project analysis to enhance predictive model performance. In addition to addressing these research questions, we provide a comprehensive comparison of classifier performance metrics, including accuracy, precision, recall, and F1 Score. Our findings not only shed light on the challenges and opportunities in software fault prediction but also emphasize the importance of considering class imbalance and model generalization for developing robust and reliable fault prediction models. This research contributes to advancing the field by providing insights into effective modeling approaches and highlighting the motivation behind addressing these challenges.
Keywords