IEEE Access (Jan 2020)

Using Partial Least Squares Regression to Fit Small Data of H7N9 Incidence Based on the Baidu Index

  • Ruijing Gan,
  • Jiyong Tan,
  • Liying Mo,
  • Yu Li,
  • Daizheng Huang

DOI
https://doi.org/10.1109/ACCESS.2020.2983799
Journal volume & issue
Vol. 8
pp. 60392 – 60400

Abstract

Read online

The internet search data will help the disease control department to estimate the disease in advance. The H7N9 epidemic that occurred in Guangxi Province was used as an example to demonstrate its association with Baidu search data. At first,16 search terms which have high correlation with H7N9 disease were selected by expert determination and calculation. At the same time, the number of disease cases were downloaded from the website of Guangxi CDC. The partial least square regression was choosed to estimate after comparing the regression models for the number of epidemic cases is very less than baidu searches data. To filter independent variables, cross validation and variable importance in projection were applied. The results show that: 1.the proposed method is suitable for fitting the data of H7N9 disease with few samples, and the fitting degree is perfect. 2.it will help to screen out the important searching index which are more relate to H7N9 epidemic by using cross validation and variable import in project. 3.compared with the PCA methods, the proposed method presented great advantages in performance index, especially with the help of cross validation and variable importance in projection.

Keywords