IEEE Access (Jan 2023)
Accuracy and Performance of Machine Learning Methodologies: Novel Assessments of Country Pandemic Vulnerability Based on Non-Pandemic Predictors
Abstract
The devastating effects of the COVID-19 pandemic created a need for sensitive and accurate machine learning methodologies for assessment of predictors of pandemic vulnerability. The performance of machine learning methodologies was assessed to correlate, predict, and rank selected demographic, health, and economic public health parameters, relative to COVID-19 case fatality rates in 26 countries. Random Forest Regressor (RFR) and Extreme Gradient Boosting models (XGBoost), both with distribution lags, a novel K-means-Coefficient of Variance (K-means-COV) sensitivity analysis approach and Ordinary Least Squares Multifactor Regression methodologies were used to evaluate correlation of predictive non-pandemic features, grouped into two novel public health indices, Population Health Index (PHI) and Country Health Index (CHI). A novel scoring model was developed for country level pandemic risk assessment. Multiple analyses demonstrated that XGBoost methodology had higher sensitivity and accuracy across all performance metrics relative to RFR, proving that cardiovascular death rate was the most dominant predictive feature for PHI for 46% of countries, and hospital beds per thousand people for CHI (46%). The novel K-means-COV sensitivity analysis approach performed with high accuracy and was successfully validated across all three methods, demonstrating that female smokers was the most common predictive feature across different analysis sets. All assessed machine learning methodologies performed with high accuracy and demonstrated strong predictive value. Only 42.3% of countries in the PHI and 15.4% in the CHI were identified to have a low pandemic vulnerability risk.
Keywords