Scientific Data (Jan 2024)
Enrichment of lung cancer computed tomography collections with AI-derived annotations
Abstract
Abstract Public imaging datasets are critical for the development and evaluation of automated tools in cancer imaging. Unfortunately, many do not include annotations or image-derived features, complicating downstream analysis. Artificial intelligence-based annotation tools have been shown to achieve acceptable performance and can be used to automatically annotate large datasets. As part of the effort to enrich public data available within NCI Imaging Data Commons (IDC), here we introduce AI-generated annotations for two collections containing computed tomography images of the chest, NSCLC-Radiomics, and a subset of the National Lung Screening Trial. Using publicly available AI algorithms, we derived volumetric annotations of thoracic organs-at-risk, their corresponding radiomics features, and slice-level annotations of anatomical landmarks and regions. The resulting annotations are publicly available within IDC, where the DICOM format is used to harmonize the data and achieve FAIR (Findable, Accessible, Interoperable, Reusable) data principles. The annotations are accompanied by cloud-enabled notebooks demonstrating their use. This study reinforces the need for large, publicly accessible curated datasets and demonstrates how AI can aid in cancer imaging.