Scientific Data (May 2024)

SAROS: A dataset for whole-body region and organ segmentation in CT imaging

  • Sven Koitka,
  • Giulia Baldini,
  • Lennard Kroll,
  • Natalie van Landeghem,
  • Olivia B. Pollok,
  • Johannes Haubold,
  • Obioma Pelka,
  • Moon Kim,
  • Jens Kleesiek,
  • Felix Nensa,
  • René Hosch

DOI
https://doi.org/10.1038/s41597-024-03337-6
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract The Sparsely Annotated Region and Organ Segmentation (SAROS) dataset was created using data from The Cancer Imaging Archive (TCIA) to provide a large open-access CT dataset with high-quality annotations of body landmarks. In-house segmentation models were employed to generate annotation proposals on randomly selected cases from TCIA. The dataset includes 13 semantic body region labels (abdominal/thoracic cavity, bones, brain, breast implant, mediastinum, muscle, parotid/submandibular/thyroid glands, pericardium, spinal cord, subcutaneous tissue) and six body part labels (left/right arm/leg, head, torso). Case selection was based on the DICOM series description, gender, and imaging protocol, resulting in 882 patients (438 female) for a total of 900 CTs. Manual review and correction of proposals were conducted in a continuous quality control cycle. Only every fifth axial slice was annotated, yielding 20150 annotated slices from 28 data collections. For the reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined. The SAROS dataset serves as an open-access resource for training and evaluating novel segmentation models, covering various scanner vendors and diseases.