Investigation of Statistical Machine Learning Models for COVID-19 Epidemic Process Simulation: Random Forest, K-Nearest Neighbors, Gradient Boosting

Dmytro Chumachenko; Ievgen Meniailov; Kseniia Bazilevych; Tetyana Chumachenko; Sergey Yakovlev

doi:10.3390/computation10060086

Computation (May 2022)

Investigation of Statistical Machine Learning Models for COVID-19 Epidemic Process Simulation: Random Forest, K-Nearest Neighbors, Gradient Boosting

Dmytro Chumachenko,
Ievgen Meniailov,
Kseniia Bazilevych,
Tetyana Chumachenko,
Sergey Yakovlev

Affiliations

Dmytro Chumachenko: Mathematical Modelling and Artificial Intelligence Department, National Aerospace University “Kharkiv Aviation Institute”, 71072 Kharkiv, Ukraine
Ievgen Meniailov: Mathematical Modelling and Artificial Intelligence Department, National Aerospace University “Kharkiv Aviation Institute”, 71072 Kharkiv, Ukraine
Kseniia Bazilevych: Mathematical Modelling and Artificial Intelligence Department, National Aerospace University “Kharkiv Aviation Institute”, 71072 Kharkiv, Ukraine
Tetyana Chumachenko: Epidemiology Department, Kharkiv National Medical University, 61000 Kharkiv, Ukraine
Sergey Yakovlev: Mathematical Modelling and Artificial Intelligence Department, National Aerospace University “Kharkiv Aviation Institute”, 71072 Kharkiv, Ukraine

DOI: https://doi.org/10.3390/computation10060086
Journal volume & issue: Vol. 10, no. 6
p. 86

Abstract

Read online

COVID-19 has become the largest pandemic in recent history to sweep the world. This study is devoted to developing and investigating three models of the COVID-19 epidemic process based on statistical machine learning and the evaluation of the results of their forecasting. The models developed are based on Random Forest, K-Nearest Neighbors, and Gradient Boosting methods. The models were studied for the adequacy and accuracy of predictive incidence for 3, 7, 10, 14, 21, and 30 days. The study used data on new cases of COVID-19 in Germany, Japan, South Korea, and Ukraine. These countries are selected because they have different dynamics of the COVID-19 epidemic process, and their governments have applied various control measures to contain the pandemic. The simulation results showed sufficient accuracy for practical use in the K-Nearest Neighbors and Gradient Boosting models. Public health agencies can use the models and their predictions to address various pandemic containment challenges. Such challenges are investigated depending on the duration of the constructed forecast.

Published in Computation

ISSN: 2079-3197 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/computation

About the journal

Abstract

Keywords