Applying Machine Learning to Estimate the Effort and Duration of Individual Tasks in Software Projects

Andre O. Sousa; Daniel T. Veloso; Henrique M. Goncalves; Joao Pascoal Faria; Joao Mendes-Moreira; Ricardo Graca; Duarte Gomes; Rui Nuno Castro; Pedro Castro Henriques

doi:10.1109/ACCESS.2023.3307310

IEEE Access (Jan 2023)

Applying Machine Learning to Estimate the Effort and Duration of Individual Tasks in Software Projects

Andre O. Sousa,
Daniel T. Veloso,
Henrique M. Goncalves,
Joao Pascoal Faria,
Joao Mendes-Moreira,
Ricardo Graca,
Duarte Gomes,
Rui Nuno Castro,
Pedro Castro Henriques

Affiliations

Andre O. Sousa: Faculty of Engineering, University of Porto, Porto, Portugal
Daniel T. Veloso: Faculty of Engineering, University of Porto, Porto, Portugal
Henrique M. Goncalves: Faculty of Engineering, University of Porto, Porto, Portugal
Joao Pascoal Faria: ORCiD; Faculty of Engineering, University of Porto, Porto, Portugal
Joao Mendes-Moreira: ORCiD; Faculty of Engineering, University of Porto, Porto, Portugal
Ricardo Graca: Fraunhofer Portugal AICOS, Porto, Portugal
Duarte Gomes: Strongstep, Porto, Portugal
Rui Nuno Castro: Fraunhofer Portugal AICOS, Porto, Portugal
Pedro Castro Henriques: Strongstep, Porto, Portugal

DOI: https://doi.org/10.1109/ACCESS.2023.3307310
Journal volume & issue: Vol. 11
pp. 89933 – 89946

Abstract

Read online

Software estimation is a vital yet challenging project management activity. Various methods, from empirical to algorithmic, have been developed to fit different development contexts, from plan-driven to agile. Recently, machine learning techniques have shown potential in this realm but are still underexplored, especially for individual task estimation. We investigate the use of machine learning techniques in predicting task effort and duration in software projects to assess their applicability and effectiveness in production environments, identify the best-performing algorithms, and pinpoint key input variables (features) for predictions. We conducted experiments with datasets of various sizes and structures exported from three project management tools used by partner companies. For each dataset, we trained regression models for predicting the effort and duration of individual tasks using eight machine learning algorithms. The models were validated using k-fold cross-validation and evaluated with several metrics. Ensemble algorithms like Random Forest, Extra Trees Regressor, and XGBoost consistently outperformed non-ensemble ones across the three datasets. However, the estimation accuracy and feature importance varied significantly across datasets, with a Mean Magnitude of Relative Error (MMRE) ranging from 0.11 to 9.45 across the datasets and target variables. Nevertheless, even in the worst-performing dataset, effort estimates aggregated to the project level showed good accuracy, with MMRE = 0.23. Machine learning algorithms, especially ensemble ones, seem to be a viable option for estimating the effort and duration of individual tasks in software projects. However, the quality of the estimates and the relevant features may depend largely on the characteristics of the available datasets and underlying projects. Nevertheless, even when the accuracy of individual estimates is poor, the aggregated estimates at the project level may present a good accuracy due to error compensation.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords