Journal of Big Data (Sep 2018)
Infoveillance of infectious diseases in USA: STDs, tuberculosis, and hepatitis
Abstract
Abstract Big Data Analytics have become an integral part of Health Informatics over the past years, with the analysis of Internet data being all the more popular in health assessment in various topics. In this study, we first examine the geographical distribution of the online behavioral variations towards Chlamydia, Gonorrhea, Syphilis, Tuberculosis, and Hepatitis in the United States by year from 2004 to 2017. Next, we examine the correlations between Google Trends data and official health data from the ‘Centers for Disease Control and Prevention’ (CDC) on said diseases, followed by estimating linear regressions for the respective relationships. The results show that Infoveillance can assist with exploring public awareness and accurately measure the behavioral changes towards said diseases. The correlations between Google Trends data and CDC data on Chlamydia cases are statistically significant at a national level and in most of the states, while the forecasting exhibits good performing results in many states. For Hepatitis, significant correlations are observed for several US States, while forecasting also exhibits promising results. On the contrary, several factors can affect the applicability of this forecasting method, as in the cases of Gonorrhea, Syphilis, and Tuberculosis, where the correlations are statistically significant in fewer states. Thus this study highlights that the analysis of Google Trends data should be done with caution in order for the results to be robust. In addition, we suggest that the applicability of this method is not that trivial or universal, and that several factors need to be taken into account when using online data in this line of research. However, this study also supports previous findings suggesting that the analysis of real-time online data is important in health assessment, as it tackles the long procedure of data collection and analysis in traditional survey methods, and provides us with information that could not be accessible otherwise.
Keywords