Journal of Ovarian Research (Nov 2024)

Machine learning models in evaluating the malignancy risk of ovarian tumors: a comparative study

  • Xin He,
  • Xiang-Hui Bai,
  • Hui Chen,
  • Wei-Wei Feng

DOI
https://doi.org/10.1186/s13048-024-01544-8
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Objectives The study aimed to compare the diagnostic efficacy of the machine learning models with expert subjective assessment (SA) in assessing the malignancy risk of ovarian tumors using transvaginal ultrasound (TVUS). Methods The retrospective single-center diagnostic study included 1555 consecutive patients from January 2019 to May 2021. Using this dataset, Residual Network(ResNet), Densely Connected Convolutional Network(DenseNet), Vision Transformer(ViT), and Swin Transformer models were established and evaluated separately or combined with Cancer antigen 125 (CA 125). The diagnostic performance was then compared with SA. Results Of the 1555 patients, 76.9% were benign, while 23.1% were malignant (including borderline). When differentiating the malignant from ovarian tumors, the SA had an AUC of 0.97 (95% CI, 0.93–0.99), sensitivity of 87.2%, and specificity of 98.4%. Except for Vision Transformer, other machine learning models had diagnostic performance comparable to that of the expert. The DenseNet model had an AUC of 0.91 (95% CI, 0.86–0.95), sensitivity of 84.6%, and specificity of 95.1%. The ResNet50 model had an AUC of 0.91 (0.85–0.95). The Swin Transformer model had an AUC of 0.92 (0.87–0.96), sensitivity of 87.2%, and specificity of 94.3%. There was a statistically significant difference between the Vision Transformer and SA, and between the Vision Transformer and Swin Transformer models (AUC: 0.87 vs. 0.97, P = 0.01; AUC: 0.87 vs. 0.92, P = 0.04). Adding CA125 did not improve the diagnostic performance of the models in distinguishing benign and malignant ovarian tumors. Conclusion The deep learning model of TVUS can be used in ovarian cancer evaluation, and its diagnostic performance is comparable to that of expert assessment.

Keywords