International Journal of Infectious Diseases (May 2023)
GLOBAL HEALTH: AN AGILE OPEN-SOURCE REPOSITORY OF INFECTIOUS DISEASE OUTBREAKS TO SUPPORT GLOBAL SURVEILLANCE AND RESEARCH EFFORTS
Abstract
Intro: The COVID-19 pandemic highlighted a need for an open-source repository of line-list case data for infectious disease surveillance and research efforts. Global.health was launched in January 2020 as a global resource for public health data research. Here, we describe the data and systems underlying the Global.health datasets and summarize the project's 2.5 years of operations and the curation of the COVID-19 and monkeypox repositories. Methods: The COVID-19 repository is curated daily through an automated system, verified by a team of researchers. The monkeypox dataset is curated manually by a team of researchers, Monday-Friday. Both repositories include metadata fields on demographics, symptomology, disease confirmation date, and others1,2. Data is de-identified and ingested from trusted sources, such as government public health agencies, trusted media outlets, and established openaccess repositories. Findings: The Global.health COVID-19 dataset is the largest repository of publicly available validated line-list data in the world, with over 100 million cases from more than 100 countries, including 60+ fields of metadata, comprising over 1 billion unique data points. The monkeypox dataset has over 35,000 data entries, from 100 different countries. 7,325 users accessed the COVID-19 repository and 3,005 accessed the monkeypox repository. Conclusion: The Global.health repositories provide verified, de-identified case data for two global outbreaks and are used by CDC, WHO, and other national public health organizations for surveillance and forecasting efforts. The repositories were utilized to share insights into the COVID-19 pandemic and track the monkeypox outbreak using real-time data3-6. We are collaborating with WHO Hub for Pandemic and Epidemic Intelligence to improve coordination, data schemas, and downstream use of data to inform and evaluate public health policy7. Future work will focus on creating a ‘turnkey’ data system to be used in future outbreaks for quicker infectious disease surveillance.