Big Data Mining and Analytics (Dec 2022)

Influencing Factors and Clustering Characteristics of COVID-19: A Global Analysis

  • Tianlong Zheng,
  • Chunli Zhang,
  • Yueting Shi,
  • Debao Chen,
  • Sheng Liu

DOI
https://doi.org/10.26599/BDMA.2022.9020010
Journal volume & issue
Vol. 5, no. 4
pp. 318 – 338

Abstract

Read online

The unprecedented coronavirus disease 2019 (COVID-19) pandemic is still raging (in year 2021) in many countries worldwide. Various response strategies to study the characteristics and distributions of the virus in various regions of the world have been developed to assist in the prevention and control of this epidemic. Descriptive statistics and regression analysis on COVID-19 data from different countries were conducted in this study to compare and evaluate various regression models. Results showed that the extreme random forest regression (ERFR) model had the best performance, and factors such as population density, ozone, median age, life expectancy, and Human Development Index (HDI) were relatively influential on the spread and diffusion of COVID-19 in the ERFR model. In addition, the epidemic clustering characteristics were analyzed through the spectral clustering algorithm. The visualization results of spectral clustering showed that the geographical distribution of global COVID-19 pandemic spread formation was highly clustered, and its clustering characteristics and influencing factors also exhibited some consistency in distribution. This study aims to deepen the understanding of the international community regarding the global COVID-19 pandemic to develop measures for countries worldwide to mitigate potential large-scale outbreaks and improve the ability to respond to such public health emergencies.

Keywords