Radiation Oncology (May 2024)

Evaluation of multiple-vendor AI autocontouring solutions

  • Lee Goddard,
  • Christian Velten,
  • Justin Tang,
  • Karin A. Skalina,
  • Robert Boyd,
  • William Martin,
  • Amar Basavatia,
  • Madhur Garg,
  • Wolfgang A. Tomé

DOI
https://doi.org/10.1186/s13014-024-02451-4
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Background Multiple artificial intelligence (AI)-based autocontouring solutions have become available, each promising high accuracy and time savings compared with manual contouring. Before implementing AI-driven autocontouring into clinical practice, three commercially available CT-based solutions were evaluated. Materials and methods The following solutions were evaluated in this work: MIM-ProtégéAI+ (MIM), Radformation-AutoContour (RAD), and Siemens-DirectORGANS (SIE). Sixteen organs were identified that could be contoured by all solutions. For each organ, ten patients that had manually generated contours approved by the treating physician (AP) were identified, totaling forty-seven different patients. CT scans in the supine position were acquired using a Siemens-SOMATOMgo 64-slice helical scanner and used to generate autocontours. Physician scoring of contour accuracy was performed by at least three physicians using a five-point Likert scale. Dice similarity coefficient (DSC), Hausdorff distance (HD) and mean distance to agreement (MDA) were calculated comparing AI contours to “ground truth” AP contours. Results The average physician score ranged from 1.00, indicating that all physicians reviewed the contour as clinically acceptable with no modifications necessary, to 3.70, indicating changes are required and that the time taken to modify the structures would likely take as long or longer than manually generating the contour. When averaged across all sixteen structures, the AP contours had a physician score of 2.02, MIM 2.07, RAD 1.96 and SIE 1.99. DSC ranged from 0.37 to 0.98, with 41/48 (85.4%) contours having an average DSC ≥ 0.7. Average HD ranged from 2.9 to 43.3 mm. Average MDA ranged from 0.6 to 26.1 mm. Conclusions The results of our comparison demonstrate that each vendor’s AI contouring solution exhibited capabilities similar to those of manual contouring. There were a small number of cases where unusual anatomy led to poor scores with one or more of the solutions. The consistency and comparable performance of all three vendors’ solutions suggest that radiation oncology centers can confidently choose any of the evaluated solutions based on individual preferences, resource availability, and compatibility with their existing clinical workflows. Although AI-based contouring may result in high-quality contours for the majority of patients, a minority of patients require manual contouring and more in-depth physician review.

Keywords