npj Digital Medicine (Mar 2021)

Deep-learning system to improve the quality and efficiency of volumetric heart segmentation for breast cancer

  • Roman Zeleznik,
  • Jakob Weiss,
  • Jana Taron,
  • Christian Guthier,
  • Danielle S. Bitterman,
  • Cindy Hancox,
  • Benjamin H. Kann,
  • Daniel W. Kim,
  • Rinaa S. Punglia,
  • Jeremy Bredfeldt,
  • Borek Foldyna,
  • Parastou Eslami,
  • Michael T. Lu,
  • Udo Hoffmann,
  • Raymond Mak,
  • Hugo J. W. L. Aerts

DOI
https://doi.org/10.1038/s41746-021-00416-5
Journal volume & issue
Vol. 4, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Although artificial intelligence algorithms are often developed and applied for narrow tasks, their implementation in other medical settings could help to improve patient care. Here we assess whether a deep-learning system for volumetric heart segmentation on computed tomography (CT) scans developed in cardiovascular radiology can optimize treatment planning in radiation oncology. The system was trained using multi-center data (n = 858) with manual heart segmentations provided by cardiovascular radiologists. Validation of the system was performed in an independent real-world dataset of 5677 breast cancer patients treated with radiation therapy at the Dana-Farber/Brigham and Women’s Cancer Center between 2008–2018. In a subset of 20 patients, the performance of the system was compared to eight radiation oncology experts by assessing segmentation time, agreement between experts, and accuracy with and without deep-learning assistance. To compare the performance to segmentations used in the clinic, concordance and failures (defined as Dice < 0.85) of the system were evaluated in the entire dataset. The system was successfully applied without retraining. With deep-learning assistance, segmentation time significantly decreased (4.0 min [IQR 3.1–5.0] vs. 2.0 min [IQR 1.3–3.5]; p < 0.001), and agreement increased (Dice 0.95 [IQR = 0.02]; vs. 0.97 [IQR = 0.02], p < 0.001). Expert accuracy was similar with and without deep-learning assistance (Dice 0.92 [IQR = 0.02] vs. 0.92 [IQR = 0.02]; p = 0.48), and not significantly different from deep-learning-only segmentations (Dice 0.92 [IQR = 0.02]; p ≥ 0.1). In comparison to real-world data, the system showed high concordance (Dice 0.89 [IQR = 0.06]) across 5677 patients and a significantly lower failure rate (p < 0.001). These results suggest that deep-learning algorithms can successfully be applied across medical specialties and improve clinical care beyond the original field of interest.