BMC Infectious Diseases (Apr 2018)

Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in metropolitan Manila, Philippines

  • Thaddeus M. Carvajal,
  • Katherine M. Viacrusis,
  • Lara Fides T. Hernandez,
  • Howell T. Ho,
  • Divina M. Amalin,
  • Kozo Watanabe

DOI
https://doi.org/10.1186/s12879-018-3066-0
Journal volume & issue
Vol. 18, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Several studies have applied ecological factors such as meteorological variables to develop models and accurately predict the temporal pattern of dengue incidence or occurrence. With the vast amount of studies that investigated this premise, the modeling approaches differ from each study and only use a single statistical technique. It raises the question of whether which technique would be robust and reliable. Hence, our study aims to compare the predictive accuracy of the temporal pattern of Dengue incidence in Metropolitan Manila as influenced by meteorological factors from four modeling techniques, (a) General Additive Modeling, (b) Seasonal Autoregressive Integrated Moving Average with exogenous variables (c) Random Forest and (d) Gradient Boosting. Methods Dengue incidence and meteorological data (flood, precipitation, temperature, southern oscillation index, relative humidity, wind speed and direction) of Metropolitan Manila from January 1, 2009 – December 31, 2013 were obtained from respective government agencies. Two types of datasets were used in the analysis; observed meteorological factors (MF) and its corresponding delayed or lagged effect (LG). After which, these datasets were subjected to the four modeling techniques. The predictive accuracy and variable importance of each modeling technique were calculated and evaluated. Results Among the statistical modeling techniques, Random Forest showed the best predictive accuracy. Moreover, the delayed or lag effects of the meteorological variables was shown to be the best dataset to use for such purpose. Thus, the model of Random Forest with delayed meteorological effects (RF-LG) was deemed the best among all assessed models. Relative humidity was shown to be the top-most important meteorological factor in the best model. Conclusion The study exhibited that there are indeed different predictive outcomes generated from each statistical modeling technique and it further revealed that the Random forest model with delayed meteorological effects to be the best in predicting the temporal pattern of Dengue incidence in Metropolitan Manila. It is also noteworthy that the study also identified relative humidity as an important meteorological factor along with rainfall and temperature that can influence this temporal pattern.

Keywords