PLoS ONE (Jan 2022)
Interval forecasts of weekly incident and cumulative COVID-19 mortality in the United States: A comparison of combining methods.
Abstract
BackgroundA combined forecast from multiple models is typically more accurate than an individual forecast, but there are few examples of studies of combining in infectious disease forecasting. We investigated the accuracy of different ways of combining interval forecasts of weekly incident and cumulative coronavirus disease-2019 (COVID-19) mortality.MethodsWe considered weekly interval forecasts, for 1- to 4-week prediction horizons, with out-of-sample periods of approximately 18 months ending on 8 January 2022, for multiple locations in the United States, using data from the COVID-19 Forecast Hub. Our comparison involved simple and more complex combining methods, including methods that involve trimming outliers or performance-based weights. Prediction accuracy was evaluated using interval scores, weighted interval scores, skill scores, ranks, and reliability diagrams.ResultsThe weighted inverse score and median combining methods performed best for forecasts of incident deaths. Overall, the leading inverse score method was 12% better than the mean benchmark method in forecasting the 95% interval and, considering all interval forecasts, the median was 7% better than the mean. Overall, the median was the most accurate method for forecasts of cumulative deaths. Compared to the mean, the median's accuracy was 65% better in forecasting the 95% interval, and 43% better considering all interval forecasts. For all combining methods except the median, combining forecasts from only compartmental models produced better forecasts than combining forecasts from all models.ConclusionsCombining forecasts can improve the contribution of probabilistic forecasting to health policy decision making during epidemics. The relative performance of combining methods depends on the extent of outliers and the type of models in the combination. The median combination has the advantage of being robust to outlying forecasts. Our results support the Hub's use of the median and we recommend further investigation into the use of weighted methods.