Scientific Reports (Nov 2021)
Estimating severe fever with thrombocytopenia syndrome transmission using machine learning methods in South Korea
Abstract
Abstract Severe fever with thrombocytopenia syndrome (SFTS) is an emerging tick-borne infectious disease in China, Japan, and Korea. This study aimed to estimate the monthly SFTS occurrence and the monthly number of SFTS cases in the geographical area in Korea using epidemiological data including demographic, geographic, and meteorological factors. Important features were chosen through univariate feature selection. Two models using machine learning methods were analyzed: the classification model in machine learning (CMML) and regression model in machine learning (RMML). We developed a novel model incorporating the CMML results into RMML, defined as modified-RMML. Feature importance was computed to assess the contribution of estimating the number of SFTS cases using modified-RMML. Aspect to the accuracy of the novel model, the performance of modified-RMML was improved by reducing the MSE for the test data as 12.6–52.2%, compared to the RMML using five machine learning methods. During the period of increasing the SFTS cases from May to October, the modified-RMML could give more accurate estimation. Computing the feature importance, it is clearly observed that climate factors such as average maximum temperature, precipitation as well as mountain visitors, and the estimation of SFTS occurrence obtained from CMML had high Gini importance. The novel model incorporating CMML and RMML models improves the accuracy of the estimation of SFTS cases. Using the model, climate factors, including temperature, relative humidity, and mountain visitors play important roles in transmitting SFTS in Korea. Our findings highlighted that the guidelines for mountain visitors to prevent SFTS transmissions should be addressed. Moreover, it provides important insights for establishing control interventions that predict early identification of SFTS cases.