International Journal of Infectious Diseases (Mar 2022)
Forecasting SARS-CoV-2 Incidence in Ontario Municipalities with Statistical and Algorithmic Modeling and Ensembles
Abstract
Purpose: In this study, a variety of statistical and algorithmic models were applied to forecast Covid-19 incidence in two Canadian cities, Wellington-Dufferin-Guelph (WDG) and Toronto, Ontario. The purpose of forecasting incidence in the two cities was to explore and compare the predictive capacity of each approach in two regions where daily incidences differ due to population sizes, thus requiring different analytical approaches to inform public health. Methods & Materials: The dataset consisted of daily Covid-19 incidence within WDG and Toronto, Ontario. Data was split into training data (March 13, 2020, to June 17, 2021) and validation data (June 18, 2021, to July 8, 2021). Models fitted to the training data were assessed on validation data. Additionally, the effective reproductive number (Re), holidays, type of variant (i.e., Alpha, Beta, Gamma, Delta), mutation common to a variant detected or no mutation detected as well as the cumulative number of first and second vaccination doses were included as predictors.Statistical models employed were General Linear Autoregressive Moving Average (GLARMA), Seasonal Autoregressive Integrated Moving Average (SARIMA) and Regression with ARIMA errors. The two machine learning algorithms were Neural Network Autoregression (NNAR) and Random Forest (RF). A hybrid model combining the statistical and algorithmic approaches (ARIMA-Boosted) was also explored. Ensembles combining several of the models were then generated to investigate improvement in predictive performance. Performance was assessed via Root Mean Square Prediction Error (RMSE) and Mean Absolute Scale Prediction Error (MASE). Results: In WDG, regression with ARIMA achieved respectable forecast accuracy (RMSE = 3.50, MASE = 0.71). Ensembles provided a marginal gain in forecast accuracy (RMSE = 3.48, MASE = 0.67) In Toronto, SARIMA modeling had the superior forecasts (RMSE = 8.14, MASE = 0.52), whereas ensembles did not improve accuracy (RMSE = 8.57, MASE = 0.58). Conclusion: Models based on observed associations (i.e., statistical modeling) provided more accurate forecasts than data driven algorithmic modeling (i.e., machine learning) for forecasting epidemic/pandemic trajectory. This finding was consistent in both WDG and Toronto, Ontario. While ensemble forecasts may slightly improve the forecast accuracy, the computational expense did not justify its application in the current examples.