Health Data Science (Jan 2024)

A Bibliographic Dataset of Health Artificial Intelligence Research

  • Xuanyu Shi,
  • Daoxin Yin,
  • Yongmei Bai,
  • Wenjing Zhao,
  • Xin Guo,
  • Huage Sun,
  • Dongliang Cui,
  • Jian Du

DOI
https://doi.org/10.34133/hds.0125
Journal volume & issue
Vol. 4

Abstract

Read online

Objective: The aim of this study is to construct a curated bibliographic dataset for a landscape analysis on Health Artificial Intelligence (HAI) research. Data Source: We integrated HAI-related bibliographic records, including publications, open research datasets, patents, research grants, and clinical trials from Medline and Dimensions. Methods: Searching: Relevant documents were identified using Medical Subject Headings (MeSH) and Field of Research (FoR) indexed by 2 bibliographic databases, Medline and Dimensions. Extracting: MeSH terms annotated from the aforementioned bibliographic databases served as the primary information for our processing. For document records lacking MeSH terms, we re-extracted them using the Medical Text Indexer (MTI). Mapping: In order to enhance interoperability, HAI multi-documents were organized using a mapping system incorporating MeSH, FoR, The International Classification of Diseases (ICD-10), and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). Integrating: All documents were curated based on a pre-defined ontology of health problems and AI technologies from the MeSH hierarchy. Results: We collected 96,332 HAI documents (publications: 75,820, open research datasets: 638, patents: 11,226, grants: 6,113, and clinical trials: 2,535) during 2009 to 2021. On average, 75.12% of the documents were tagged with at least one label related to either health problems or AI technologies (with 92.9% of publications tagged). Summary: This study presents a comprehensive pipeline for processing and curating HAI bibliographic documents following the FAIR (Findable, Accessible, Interoperable, Reusable) standard, offering a valuable multidimensional collection for the community. This dataset serves as a crucial resource for horizontally scanning the funding, research, clinical assessments, and innovations within the HAI field.