PLoS ONE (Jan 2024)
Analysis of learning curves in predictive modeling using exponential curve fitting with an asymptotic approach.
Abstract
The existence of large volumes of data has considerably alleviated concerns regarding the availability of sufficient data instances for machine learning experiments. Nevertheless, in certain contexts, addressing limited data availability may demand distinct strategies and efforts. Analyzing COVID-19 predictions at pandemic beginning emerged a question: how much data is needed to make reliable predictions? When does the volume of data provide a better understanding of the disease's evolution and, in turn, offer reliable forecasts? Given these questions, the objective of this study is to analyze learning curves obtained from predicting the incidence of COVID-19 in Brazilian States using ARIMA models with limited available data. To fulfill the objective, a retrospective exploration of COVID-19 incidence across the Brazilian States was performed. After the data acquisition and modeling, the model errors were assessed by employing a learning curve analysis. The asymptotic exponential curve fitting enabled the evaluation of the errors in different points, reflecting the increased available data over time. For a comprehensive understanding of the results at distinct stages of the time evolution, the average derivative of the curves and the equilibrium points were calculated, aimed to identify the convergence of the ARIMA models to a stable pattern. We observed differences in average derivatives and equilibrium values among the multiple samples. While both metrics ultimately confirmed the convergence to stability, the equilibrium points were more sensitive to changes in the models' accuracy and provided a better indication of the learning progress. The proposed method for constructing learning curves enabled consistent monitoring of prediction results, providing evidence-based understandings required for informed decision-making.