Journal of Medical Internet Research (Nov 2020)

Leveraging Internet Search Data to Improve the Prediction and Prevention of Noncommunicable Diseases: Retrospective Observational Study

  • Xu, Chenjie,
  • Cao, Zhi,
  • Yang, Hongxi,
  • Gao, Ying,
  • Sun, Li,
  • Hou, Yabing,
  • Cao, Xinxi,
  • Jia, Peng,
  • Wang, Yaogang

Journal volume & issue
Vol. 22, no. 11
p. e18998


Read online

BackgroundAs human society enters an era of vast and easily accessible social media, a growing number of people are exploiting the internet to search and exchange medical information. Because internet search data could reflect population interest in particular health topics, they provide a new way of understanding health concerns regarding noncommunicable diseases (NCDs) and the role they play in their prevention. ObjectiveWe aimed to explore the association of internet search data for NCDs with published disease incidence and mortality rates in the United States and to grasp the health concerns toward NCDs. MethodsWe tracked NCDs by examining the correlations among the incidence rates, mortality rates, and internet searches in the United States from 2004 to 2017, and we established forecast models based on the relationship between the disease rates and internet searches. ResultsIncidence and mortality rates of 29 diseases in the United States were statistically significantly correlated with the relative search volumes (RSVs) of their search terms (P<.05). From the perspective of the goodness of fit of the multiple regression prediction models, the results were closest to 1 for diabetes mellitus, stroke, atrial fibrillation and flutter, Hodgkin lymphoma, and testicular cancer; the coefficients of determination of their linear regression models for predicting incidence were 80%, 88%, 96%, 80%, and 78%, respectively. Meanwhile, the coefficient of determination of their linear regression models for predicting mortality was 82%, 62%, 94%, 78%, and 62%, respectively. ConclusionsAn advanced understanding of search behaviors could augment traditional epidemiologic surveillance and could be used as a reference to aid in disease prediction and prevention.