Journal of King Saud University: Computer and Information Sciences (Nov 2024)
Framework to improve software effort estimation accuracy using novel ensemble rule
Abstract
This investigation focuses on refining software effort estimation (SEE) to enhance project outcomes amidst the rapid evolution of the software industry. Accurate estimation is a cornerstone of project success, crucial for avoiding budget overruns and minimizing the risk of project failures. The framework proposed in this article addresses three significant issues that are critical for accurate estimation: dealing with missing or inadequate data, selecting key features, and improving the software effort model. Our proposed framework incorporates three methods: the Novel Incomplete Value Imputation Model (NIVIM), a hybrid model using Correlation-based Feature Selection with a meta-heuristic algorithm (CFS-Meta), and the Heterogeneous Ensemble Model (HEM). The combined framework synergistically enhances the robustness and accuracy of SEE by effectively handling missing data, optimizing feature selection, and integrating diverse predictive models for superior performance across varying project scenarios. The framework significantly reduces imputation and feature selection overhead, while the ensemble approach optimizes model performance through dynamic weighting and meta-learning. This results in lower mean absolute error (MAE) and reduced computational complexity, making it more effective for diverse software datasets. NIVIM is engineered to address incomplete datasets prevalent in SEE. By integrating a synthetic data methodology through a Variational Auto-Encoder (VAE), the model incorporates both contextual relevance and intrinsic project features, significantly enhancing estimation precision. Comparative analyses reveal that NIVIM surpasses existing models such as VAE, GAIN, K-NN, and MICE, achieving statistically significant improvements across six benchmark datasets, with average RMSE improvements ranging from 11.05% to 17.72% and MAE improvements from 9.62% to 21.96%. Our proposed method, CFS-Meta, balances global optimization with local search techniques, substantially enhancing predictive capabilities. The proposed CFS-Meta model was compared to single and hybrid feature selection models to assess its efficiency, demonstrating up to a 25.61% reduction in MSE. Additionally, the proposed CFS-Meta achieves a 10% (MAE) improvement against the hybrid PSO-SA model, an 11.38% (MAE) improvement compared to the Hybrid ABC-SA model, and 12.42% and 12.703% (MAE) improvements compared to the hybrid Tabu-GA and hybrid ACO-COA models, respectively. Our third method proposes an ensemble effort estimation (EEE) model that amalgamates diverse standalone models through a Dynamic Weight Adjustment-stacked combination (DWSC) rule. Tested against international benchmarks and industry datasets, the HEM method has improved the standalone model by an average of 21.8% (Pred()) and the homogeneous ensemble model by 15% (Pred()). This comprehensive methodology underscores our model’s contributions to advancing software project management (SPM) through advanced predictive modeling, setting a new benchmark for software engineering effort estimation.