Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model

Pengyu Wang; Wangjian Zhang; Hui Wang; Congxing Shi; Zhiqiang Li; Dahu Wang; Lei Luo; Zhicheng Du; Yuantao Hao

doi:10.1186/s12879-024-09138-x

BMC Infectious Diseases (Feb 2024)

Predicting the incidence of infectious diarrhea with symptom surveillance data using a stacking-based ensembled model

Pengyu Wang,
Wangjian Zhang,
Hui Wang,
Congxing Shi,
Zhiqiang Li,
Dahu Wang,
Lei Luo,
Zhicheng Du,
Yuantao Hao

Affiliations

Pengyu Wang: Department of Medical Statistics, School of Public Health & Center for Health Information Research & Sun Yat-sen Global Health Institute, Sun Yat-sen University
Wangjian Zhang: Department of Medical Statistics, School of Public Health & Center for Health Information Research & Sun Yat-sen Global Health Institute, Sun Yat-sen University
Hui Wang: Department of Infectious Disease Control and Prevention, Guangzhou Center for Disease Control and Prevention
Congxing Shi: Department of Medical Statistics, School of Public Health & Center for Health Information Research & Sun Yat-sen Global Health Institute, Sun Yat-sen University
Zhiqiang Li: Department of Medical Statistics, School of Public Health & Center for Health Information Research & Sun Yat-sen Global Health Institute, Sun Yat-sen University
Dahu Wang: Department of Infectious Disease Control and Prevention, Guangzhou Center for Disease Control and Prevention
Lei Luo: Department of Infectious Disease Control and Prevention, Guangzhou Center for Disease Control and Prevention
Zhicheng Du: Department of Medical Statistics, School of Public Health & Center for Health Information Research & Sun Yat-sen Global Health Institute, Sun Yat-sen University
Yuantao Hao: Peking University Center for Public Health and Epidemic Preparedness & Response

DOI: https://doi.org/10.1186/s12879-024-09138-x
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Infectious diarrhea remains a major public health problem worldwide. This study used stacking ensemble to developed a predictive model for the incidence of infectious diarrhea, aiming to achieve better prediction performance. Methods Based on the surveillance data of infectious diarrhea cases, relevant symptoms and meteorological factors of Guangzhou from 2016 to 2021, we developed four base prediction models using artificial neural networks (ANN), Long Short-Term Memory networks (LSTM), support vector regression (SVR) and extreme gradient boosting regression trees (XGBoost), which were then ensembled using stacking to obtain the final prediction model. All the models were evaluated with three metrics: mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE). Results Base models that incorporated symptom surveillance data and weekly number of infectious diarrhea cases were able to achieve lower RMSEs, MAEs, and MAPEs than models that added meteorological data and weekly number of infectious diarrhea cases. The LSTM had the best prediction performance among the four base models, and its RMSE, MAE, and MAPE were: 84.85, 57.50 and 15.92%, respectively. The stacking ensembled model outperformed the four base models, whose RMSE, MAE, and MAPE were 75.82, 55.93, and 15.70%, respectively. Conclusions The incorporation of symptom surveillance data could improve the predictive accuracy of infectious diarrhea prediction models, and symptom surveillance data was more effective than meteorological data in enhancing model performance. Using stacking to combine multiple prediction models were able to alleviate the difficulty in selecting the optimal model, and could obtain a model with better performance than base models.

Published in BMC Infectious Diseases

ISSN: 1471-2334 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Infectious and parasitic diseases
Website: https://bmcinfectdis.biomedcentral.com

About the journal

Abstract

Keywords