Scientific Reports (Nov 2024)

Analysis and prediction of infectious diseases based on spatial visualization and machine learning

  • Yunyun Cheng,
  • Yanping Bai,
  • Jing Yang,
  • Xiuhui Tan,
  • Ting Xu,
  • Rong Cheng

DOI
https://doi.org/10.1038/s41598-024-80058-1
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Infectious diseases are a global public health problem that poses a threat to human society. Since the 1970s, constantly mutated new infectious viruses have been quietly attacking humanity, and at least one new type of infectious disease is discovered every year. Therefore, early warning of infectious diseases will greatly reduce the socio-economic harm of infectious diseases. This study is based on the data of COVID-19 epidemic in China (except Macau and Taiwan Province) from 2020 to 2022. Firstly, we used ArcGIS software to analyze the spatial agglomeration pattern of the number of patients in various regions of China through global spatial autocorrelation analysis, local spatial autocorrelation analysis, center of gravity trajectory migration algorithm and other statistical tools; In addition, the areas with serious COVID-19 epidemic and requiring special attention were screened out. Then, autoregressive integrated moving average model (ARIMA), extreme learning machine (ELM), support vector regression (SVR), wavelet neural network (Wavelet), recurrent neural network (RNN) and long short-term memory (LSTM) were used to predict COVID-19 epidemic data in Guangdong Province, China; And the prediction performance of each model was compared through prediction accuracy indicators. Finally, a multi algorithm fusion learning model based on stacking technology is proposed to address the problem of poor generalization ability of single algorithm models in prediction; Furthermore, radial basis function network (RBF) was used as a two-level meta learner to fuse the above models, and particle swarm optimization (PSO) was used to optimize RBF parameters to reduce generalization error. The experimental results show that the performance of the integrated model is better than that of the single model in the COVID-19 dataset. In order to better apply the stacking model to the prediction of new infectious diseases, we applied the prediction model based on the COVID-19 dataset to the prediction of the number of AIDS and pulmonary tuberculosis (PTB) cases, and verified the wide applicability of our model in the prediction of infectious diseases.

Keywords