IEEE Access (Jan 2022)

Toward Improving the Efficiency of Software Development Effort Estimation via Clustering Analysis

  • Vo Van Hai,
  • Ho Le Thi Kim Nhung,
  • Zdenka Prokopova,
  • Radek Silhavy,
  • Petr Silhavy

DOI
https://doi.org/10.1109/ACCESS.2022.3185393
Journal volume & issue
Vol. 10
pp. 83249 – 83264

Abstract

Read online

Introduction: The precise estimation of software effort is a significant difficulty that project managers encounter during software development. Inaccurate forecasting leads to either overestimating or underestimating software effort, which can be detrimental for stakeholders. The International Function Point Users Group’s Function Point Analysis (FPA) method is one of the most critical methods for software effort estimation. However, the practice of using the FPA method in the same fashion across all software areas needs to be reexamined. Aim: We propose a model for evaluating the influence of data clustering on software development effort estimation and then finding the best clustering method. We call this model the effort estimation using machine learning applied to the clusters (EEAC) model. Method: We cluster the dataset according to the clustering method and then apply the FPA and EEAC methods to these clusters for effort estimation. The clustering methods we use in this study include five categorical variable criteria (Development Platform, Industrial Sector, Language Type, Organization Type, and Relative Size) and the k-means clustering algorithm. Results: The experimental results show that the estimation accuracy obtaining with clustering consistently outperforms the accuracy without clustering for both the FPA and EEAC methods. Significantly, using the FPA method, the average improvement rate from using clustering as opposed to non-clustered was highest at 58.06%, according to the RMSE. With the EEAC method, this number reached 65.53%. The Industry Sector categorical variable achieves the best accuracy estimation compared to the other clustering criteria and k-means clustering. The improvement in accuracy in terms of the RMSE when applying this criterion is 63.68% for the FPA method and 72.02% for the EEAC method. Conclusion: Better results are obtained through dataset clustering compared to no clustering for both the FPA and EEAC methods. The Industry Sector is the most suitable clustering method among the tested clustering methods.

Keywords