BMJ Open (Dec 2020)

Comparison of ARIMA model and XGBoost model for prediction of human brucellosis in mainland China: a time-series study

  • Mirxat Alim,
  • Guo-Hua Ye,
  • De-Sheng Huang,
  • Bao-Sen Zhou

DOI
https://doi.org/10.1136/bmjopen-2020-039676
Journal volume & issue
Vol. 10, no. 12

Abstract

Read online

Objectives Human brucellosis is a public health problem endangering health and property in China. Predicting the trend and the seasonality of human brucellosis is of great significance for its prevention. In this study, a comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more suitable for predicting the occurrence of brucellosis in mainland China.Design Time-series study.Setting Mainland China.Methods Data on human brucellosis in mainland China were provided by the National Health and Family Planning Commission of China. The data were divided into a training set and a test set. The training set was composed of the monthly incidence of human brucellosis in mainland China from January 2008 to June 2018, and the test set was composed of the monthly incidence from July 2018 to June 2019. The mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE) were used to evaluate the effects of model fitting and prediction.Results The number of human brucellosis patients in mainland China increased from 30 002 in 2008 to 40 328 in 2018. There was an increasing trend and obvious seasonal distribution in the original time series. For the training set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 338.867, 450.223 and 10.323, respectively, and the MAE, RSME and MAPE of the XGBoost model were 189.332, 262.458 and 4.475, respectively. For the test set, the MAE, RSME and MAPE of the ARIMA(0,1,1)×(0,1,1)12 model were 529.406, 586.059 and 17.676, respectively, and the MAE, RSME and MAPE of the XGBoost model were 249.307, 280.645 and 7.643, respectively.Conclusions The performance of the XGBoost model was better than that of the ARIMA model. The XGBoost model is more suitable for prediction cases of human brucellosis in mainland China.