Case Studies in Chemical and Environmental Engineering (Jun 2021)
An analysis to identify the important variables for the spread of COVID-19 using numerical techniques and data science
Abstract
Considering system theory, the socio-economic variables that constitute a society should be able to capture the system response such as the number of weekly COVID-19 cases. A numerical approach has been presented in this paper to answer two vital questions; which variables are more important and how many variables are needed to capture the dynamics of the spread. Using the theory of least squares regression, two types of problems have been set up and solved using multilinear regression (MLR) and nonlinear powered function known as NLR in this study. Numerical techniques were applied to pre- and post-process the data and the vast number of outputs. Total 43 socio-economic and meteorological variables from 31 counties in California in the United States resulted about 37.4 millions of combinations for the analysis. Results show that variables related to total population, household income, occupation, and transportation are more important than the others. The frequency of having higher correlation for a variable increases as more variables are combined with it. Similarly, correlation increases as the number of variables in a combination increases. Some 5- variable combinations can capture the dynamics of the spread with higher accuracy having correlation coefficient as high as 0.985.