Scientific Data (Oct 2024)

AI-Generated Annotations Dataset for Diverse Cancer Radiology Collections in NCI Image Data Commons

  • Gowtham Krishnan Murugesan,
  • Diana McCrumb,
  • Mariam Aboian,
  • Tej Verma,
  • Rahul Soni,
  • Fatima Memon,
  • Keyvan Farahani,
  • Linmin Pei,
  • Ulrike Wagner,
  • Andrey Y. Fedorov,
  • David Clunie,
  • Stephen Moore,
  • Jeff Van Oss

DOI
https://doi.org/10.1038/s41597-024-03977-8
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 18

Abstract

Read online

Abstract The National Cancer Institute (NCI) Image Data Commons (IDC) offers publicly available cancer radiology collections for cloud computing, crucial for developing advanced imaging tools and algorithms. Despite their potential, these collections are minimally annotated; only 4% of DICOM studies in collections considered in the project had existing segmentation annotations. This project increases the quantity of segmentations in various IDC collections. We produced high-quality, AI-generated imaging annotations dataset of tissues, organs, and/or cancers for 11 distinct IDC image collections. These collections contain images from a variety of modalities, including computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET). The collections cover various body parts, such as the chest, breast, kidneys, prostate, and liver. A portion of the AI annotations were reviewed and corrected by a radiologist to assess the performance of the AI models. Both the AI’s and the radiologist’s annotations were encoded in conformance to the Digital Imaging and Communications in Medicine (DICOM) standard, allowing for seamless integration into the IDC collections as third-party analysis collections. All the models, images and annotations are publicly accessible.