Big Data and Cognitive Computing (Oct 2024)
An XGBoost Approach to Predictive Modelling of Rift Valley Fever Outbreaks in Kenya Using Climatic Factors
Abstract
Reports of Rift Valley fever (RVF), a highly climate-sensitive zoonotic disease, have been rather frequent in Kenya. Although multiple empirical analyses have shown that machine learning methods outperform time series models in forecasting time series data, there is limited evidence of their application in predicting disease outbreaks in Africa. In recent times, the literature has reported several applications of machine learning in facilitating intelligent decision-making within the healthcare sector and public health. However, there is a scarcity of information regarding the utilization of the XGBoost model for predicting disease outbreaks. Within the provinces of Kenya, the incidence of Rift Valley fever was more prominent in the Rift Valley (26.80%) and Eastern (20.60%) regions. This study investigated the correlation between the occurrence of RVF (rapid vegetation failure) and several climatic variables, including humidity, clay content, elevation, slope, and rainfall. The correlation matrix revealed a modest linear dependence between different climatic variables and RVF cases, with the highest correlation, a mere 0.02903, observed for rainfall. The XGBoost model was trained using these climate variables and achieved outstanding performance measures including an AUC of 0.8908, accuracy of 99.74%, precision of 99.75%, and recall of 99.99%. The analysis of feature importance revealed that rainfall was the most significant predictor. These findings align with previous studies demonstrating the significance of weather conditions in RVF outbreaks. The study’s results indicate that incorporating advanced machine learning models that consider several climatic variables can significantly enhance the prediction and management of RVF incidence.
Keywords