Scientific Reports (Aug 2023)

Machine learning and statistical models for analyzing multilevel patent data

  • Sunyun Qi,
  • Yu Zhang,
  • Hua Gu,
  • Fei Zhu,
  • Meiying Gao,
  • Hongxiao Liang,
  • Qifeng Zhang,
  • Yanchao Gao

DOI
https://doi.org/10.1038/s41598-023-37922-3
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract A recent surge of patent applications among public hospitals in China has aroused significant research interest. A country’s healthcare innovation capacity can be measured by its number of patents. This paper explores the link between the number of patents and ten independent variables. Multicollinearity was carefully detected and removed by using the variable selection method and LASSO regression, respectively. The Poisson model and the negative binomial model were proposed to analyze the patent data. Three goodness of fit tests, the Pearson test, the deviance test, and the DHARMa non-parametric dispersion test, were conducted to investigate if the model has a good fit. After discovering four clusters by conducting agglomerative hierarchical clustering, these two models were replaced by the negative binomial mixed model. The likelihood ratio test was used to determine which model is more appropriate and the results reveal that the negative binomial mixed model outperforms both the Poisson model and the negative binomial model. Three variables, number of health technicians per 10,000 people, financial expenditure on science and technology as well as number of patent applications per 10,000 health personnel, have a significantly positive relationship with the number of patents in Chinese tertiary public hospitals.