Data in Brief (Dec 2024)

Dataset for gastrointestinal tract segmentation on serial MRIs for abdominal tumor radiotherapyKaggle

  • Sangjune L. Lee,
  • Poonam Yadav,
  • Yin Li,
  • Jason J. Meudt,
  • Jessica Strang,
  • Dustin Hebel,
  • Alyx Alfson,
  • Stephanie J. Olson,
  • Tera R. Kruser,
  • Jennifer B. Smilowitz,
  • Kailee Borchert,
  • Brianne Loritz,
  • Laila Gharzai,
  • Shervin Karimpour,
  • John Bayouth,
  • Michael F. Bassetti

Journal volume & issue
Vol. 57
p. 111159

Abstract

Read online

Purpose: Integrated MRI and linear accelerator systems (MR-Linacs) provide superior soft tissue contrast, and the capability of adapting radiotherapy plans to changes in daily anatomy. In this dataset, serial MRIs of the abdomen of patients undergoing radiotherapy were collected and the luminal gastro-intestinal tract was segmented to support an online segmentation algorithm competition. This dataset may be further utilized by radiation oncologists, medical physicists, and data scientists to further improve auto segmentation algorithms. Acquisition and validation of methods: Serial 0.35T MRIs from patients who were treated on an MR-Linac for tumors located in the abdomen were collected. The stomach, small intestine and large intestine were manually segmented on all MRIs by a team of annotators under the supervision of a board-certified radiation oncologist. Annotator segmentations were validated on 4 representative abdominal MRIs by comparing to the radiation oncologist's contours using 3D Hausdorff distance and 3D Dice coefficient metrics. Data format and usage notes: The dataset includes 467 de-identified scans and their contours from 107 patients. Each patient underwent 1–5 MRI scans of the abdomen. Most of the scans consisted of 144 axial slices with a pixel resolution of 1.5 × 1.5 × 3 mm, leading to 67,248 total slices in the dataset. Images in DICOM format were converted into Portable Graphics Format (PNG) files. Each Portable Graphics Format (PNG) image file stored a slice of the scan, with pixels recorded in 16 bits to cover the full range of intensity values. DICOM-RT segmentations were converted into Comma-Separated Values (CSV) format. Data including images and the annotations is publicly available at https://www.kaggle.com/ds/3577354. Potential applications: While manual segmentations are subject to bias and inter-observer variability, the dataset has been used for the UW-Madison GI Tract Image Segmentation Challenge hosted by Kaggle and may be used for ongoing segmentation algorithm development and potentially for dose accumulation algorithms.

Keywords