Frontiers in Physics (Mar 2024)
Dimensionality reduction and machine learning based model of software cost estimation
Abstract
Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness.
Keywords