Insights into Imaging (Aug 2022)

Quality assurance for automatically generated contours with additional deep learning

  • Lars Johannes Isaksson,
  • Paul Summers,
  • Abhir Bhalerao,
  • Sara Gandini,
  • Sara Raimondi,
  • Matteo Pepa,
  • Mattia Zaffaroni,
  • Giulia Corrao,
  • Giovanni Carlo Mazzola,
  • Marco Rotondi,
  • Giuliana Lo Presti,
  • Zaharudin Haron,
  • Sara Alessi,
  • Paola Pricolo,
  • Francesco Alessandro Mistretta,
  • Stefano Luzzago,
  • Federica Cattani,
  • Gennaro Musi,
  • Ottavio De Cobelli,
  • Marta Cremonesi,
  • Roberto Orecchia,
  • Giulia Marvaso,
  • Giuseppe Petralia,
  • Barbara Alicja Jereczek-Fossa

DOI
https://doi.org/10.1186/s13244-022-01276-7
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Objective Deploying an automatic segmentation model in practice should require rigorous quality assurance (QA) and continuous monitoring of the model’s use and performance, particularly in high-stakes scenarios such as healthcare. Currently, however, tools to assist with QA for such models are not available to AI researchers. In this work, we build a deep learning model that estimates the quality of automatically generated contours. Methods The model was trained to predict the segmentation quality by outputting an estimate of the Dice similarity coefficient given an image contour pair as input. Our dataset contained 60 axial T2-weighted MRI images of prostates with ground truth segmentations along with 80 automatically generated segmentation masks. The model we used was a 3D version of the EfficientDet architecture with a custom regression head. For validation, we used a fivefold cross-validation. To counteract the limitation of the small dataset, we used an extensive data augmentation scheme capable of producing virtually infinite training samples from a single ground truth label mask. In addition, we compared the results against a baseline model that only uses clinical variables for its predictions. Results Our model achieved a mean absolute error of 0.020 ± 0.026 (2.2% mean percentage error) in estimating the Dice score, with a rank correlation of 0.42. Furthermore, the model managed to correctly identify incorrect segmentations (defined in terms of acceptable/unacceptable) 99.6% of the time. Conclusion We believe that the trained model can be used alongside automatic segmentation tools to ensure quality and thus allow intervention to prevent undesired segmentation behavior.

Keywords