IEEE Access (Jan 2022)
Adaptive Discretization Using Golden Section to Aid Outlier Detection for Software Development Effort Estimation
Abstract
The software engineering researchers have worked on different dimensions to facilitate better software effort estimates, including those focusing on dataset quality improvement. In this research, we specially investigated the effectiveness of outlier removal to improve estimation performance of 5 machine learning (ML) methods (Support Vector Regression, Random Forest, Ridge Regression, K-Nearest Neighbor, and Gradient Boosting Machines) for software development effort estimation (SDEE). We propose a novel discretization method based on Golden Section (dubbed as Golden Section based Adaptive Discretization, GSAD) to identify optimal number of outliers for SDEE dataset. The results signify the importance of optimal number of outliers’ removal to improve estimations. Moreover, the results obtained after applying GSAD technique have been compared with IQR and Cooks’ distance based outlier identification methods over 4 datasets: ISBSG Release 2021, UCP, NASA93 and China. The empirical results confirm that the performance of ML based SDEE methods is generally improving by employing GSAD and the proposed GSAD method has the ability to compete with the other prevalent outlier identification methods.
Keywords