npj Precision Oncology (Feb 2024)

Machine learning and radiomics for segmentation and classification of adnexal masses on ultrasound

  • Jennifer F. Barcroft,
  • Kristofer Linton-Reid,
  • Chiara Landolfo,
  • Maya Al-Memar,
  • Nina Parker,
  • Chris Kyriacou,
  • Maria Munaretto,
  • Martina Fantauzzi,
  • Nina Cooper,
  • Joseph Yazbek,
  • Nishat Bharwani,
  • Sa Ra Lee,
  • Ju Hee Kim,
  • Dirk Timmerman,
  • Joram Posma,
  • Luca Savelli,
  • Srdjan Saso,
  • Eric O. Aboagye,
  • Tom Bourne

DOI
https://doi.org/10.1038/s41698-024-00527-8
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Ultrasound-based models exist to support the classification of adnexal masses but are subjective and rely upon ultrasound expertise. We aimed to develop an end-to-end machine learning (ML) model capable of automating the classification of adnexal masses. In this retrospective study, transvaginal ultrasound scan images with linked diagnoses (ultrasound subjective assessment or histology) were extracted and segmented from Imperial College Healthcare, UK (ICH development dataset; n = 577 masses; 1444 images) and Morgagni-Pierantoni Hospital, Italy (MPH external dataset; n = 184 masses; 476 images). A segmentation and classification model was developed using convolutional neural networks and traditional radiomics features. Dice surface coefficient (DICE) was used to measure segmentation performance and area under the ROC curve (AUC), F1-score and recall for classification performance. The ICH and MPH datasets had a median age of 45 (IQR 35–60) and 48 (IQR 38–57) years old and consisted of 23.1% and 31.5% malignant cases, respectively. The best segmentation model achieved a DICE score of 0.85 ± 0.01, 0.88 ± 0.01 and 0.85 ± 0.01 in the ICH training, ICH validation and MPH test sets. The best classification model achieved a recall of 1.00 and F1-score of 0.88 (AUC:0.93), 0.94 (AUC:0.89) and 0.83 (AUC:0.90) in the ICH training, ICH validation and MPH test sets, respectively. We have developed an end-to-end radiomics-based model capable of adnexal mass segmentation and classification, with a comparable predictive performance (AUC 0.90) to the published performance of expert subjective assessment (gold standard), and current risk models. Further prospective evaluation of the classification performance of this ML model against existing methods is required.