Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate

Jayakumar Kaliappan; Kathiravan Srinivasan; Saeed Mian Qaisar; Karpagam Sundararajan; Chuan-Yu Chang; Suganthan C

doi:10.3389/fpubh.2021.729795

Frontiers in Public Health (Sep 2021)

Performance Evaluation of Regression Models for the Prediction of the COVID-19 Reproduction Rate

Jayakumar Kaliappan,
Kathiravan Srinivasan,
Saeed Mian Qaisar,
Karpagam Sundararajan,
Chuan-Yu Chang,
Suganthan C

Affiliations

Jayakumar Kaliappan: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Kathiravan Srinivasan: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Saeed Mian Qaisar: Electrical and Computer Engineering Department, Effat University, Jeddah, Saudi Arabia
Karpagam Sundararajan: School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
Chuan-Yu Chang: Department of Computer Science and Information Engineering, National Yunlin University of Science and Technology, Douliu, Taiwan
Suganthan C: School of Social Sciences and Languages, Vellore Institute of Technology, Vellore, India

DOI: https://doi.org/10.3389/fpubh.2021.729795
Journal volume & issue: Vol. 9

Abstract

Read online

This paper aims to evaluate the performance of multiple non-linear regression techniques, such as support-vector regression (SVR), k-nearest neighbor (KNN), Random Forest Regressor, Gradient Boosting, and XGBOOST for COVID-19 reproduction rate prediction and to study the impact of feature selection algorithms and hyperparameter tuning on prediction. Sixteen features (for example, Total_cases_per_million and Total_deaths_per_million) related to significant factors, such as testing, death, positivity rate, active cases, stringency index, and population density are considered for the COVID-19 reproduction rate prediction. These 16 features are ranked using Random Forest, Gradient Boosting, and XGBOOST feature selection algorithms. Seven features are selected from the 16 features according to the ranks assigned by most of the above mentioned feature-selection algorithms. Predictions by historical statistical models are based solely on the predicted feature and the assumption that future instances resemble past occurrences. However, techniques, such as Random Forest, XGBOOST, Gradient Boosting, KNN, and SVR considered the influence of other significant features for predicting the result. The performance of reproduction rate prediction is measured by mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), R-Squared, relative absolute error (RAE), and root relative squared error (RRSE) metrics. The performances of algorithms with and without feature selection are similar, but a remarkable difference is seen with hyperparameter tuning. The results suggest that the reproduction rate is highly dependent on many features, and the prediction should not be based solely upon past values. In the case without hyperparameter tuning, the minimum value of RAE is 0.117315935 with feature selection and 0.0968989 without feature selection, respectively. The KNN attains a low MAE value of 0.0008 and performs well without feature selection and with hyperparameter tuning. The results show that predictions performed using all features and hyperparameter tuning is more accurate than predictions performed using selected features.

Published in Frontiers in Public Health

ISSN: 2296-2565 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Public aspects of medicine
Website: https://www.frontiersin.org/journals/public-health

About the journal

Abstract

Keywords