Data in Brief (Dec 2023)
A comparative dataset: Bridging COVID-19 and other diseases through epistemonikos and CORD-19 evidence
Abstract
The COVID-19 pandemic has underlined the need for reliable information for clinical decision-making and public health policies. As such, evidence-based medicine (EBM) is essential in identifying and evaluating scientific documents pertinent to novel diseases, and the accurate classification of biomedical text is integral to this process. Given this context, we introduce a comprehensive, curated dataset composed of COVID-19-related documents.This dataset includes 20,047 labeled documents that were meticulously classified into five distinct categories: systematic reviews (SR), primary study randomized controlled trials (PS-RCT), primary study non-randomized controlled trials (PS-NRCT), broad synthesis (BS), and excluded (EXC). The documents, labeled by collaborators from the Epistemonikos Foundation, incorporate information such as document type, title, abstract, and metadata, including PubMed id, authors, journal, and publication date.Uniquely, this dataset has been curated by the Epistemonikos Foundation and is not readily accessible through conventional web-scraping methods, thereby attesting to its distinctive value in this field of research. In addition to this, the dataset also includes a vast evidence repository comprising 427,870 non-COVID-19 documents, also categorized into SR, PS-RCT, PS-NRCT, BS, and EXC. This additional collection can serve as a valuable benchmark for subsequent research. The comprehensive nature of this open-access dataset and its accompanying resources is poised to significantly advance evidence-based medicine and facilitate further research in the domain.