Journal of Pathology Informatics (Dec 2024)

A selective CutMix approach improves generalizability of deep learning-based grading and risk assessment of prostate cancer

  • Sushant Patkar,
  • Stephanie Harmon,
  • Isabell Sesterhenn,
  • Rosina Lis,
  • Maria Merino,
  • Denise Young,
  • G. Thomas Brown,
  • Kimberly M. Greenfield,
  • John D. McGeeney,
  • Sally Elsamanoudi,
  • Shyh-Han Tan,
  • Cara Schafer,
  • Jiji Jiang,
  • Gyorgy Petrovics,
  • Albert Dobi,
  • Francisco J. Rentas,
  • Peter A. Pinto,
  • Gregory T. Chesnut,
  • Peter Choyke,
  • Baris Turkbey,
  • Joel T. Moncur

Journal volume & issue
Vol. 15
p. 100381

Abstract

Read online

The Gleason score is an important predictor of prognosis in prostate cancer. However, its subjective nature can result in over- or under-grading. Our objective was to train an artificial intelligence (AI)-based algorithm to grade prostate cancer in specimens from patients who underwent radical prostatectomy (RP) and to assess the correlation of AI-estimated proportions of different Gleason patterns with biochemical recurrence-free survival (RFS), metastasis-free survival (MFS), and overall survival (OS). Training and validation of algorithms for cancer detection and grading were completed with three large datasets containing a total of 580 whole-mount prostate slides from 191 RP patients at two centers and 6218 annotated needle biopsy slides from the publicly available Prostate Cancer Grading Assessment dataset. A cancer detection model was trained using MobileNetV3 on 0.5 mm × 0.5 mm cancer areas (tiles) captured at 10× magnification. For cancer grading, a Gleason pattern detector was trained on tiles using a ResNet50 convolutional neural network and a selective CutMix training strategy involving a mixture of real and artificial examples. This strategy resulted in improved model generalizability in the test set compared with three different control experiments when evaluated on both needle biopsy slides and whole-mount prostate slides from different centers. In an additional test cohort of RP patients who were clinically followed over 30 years, quantitative Gleason pattern AI estimates achieved concordance indexes of 0.69, 0.72, and 0.64 for predicting RFS, MFS, and OS times, outperforming the control experiments and International Society of Urological Pathology system (ISUP) grading by pathologists. Finally, unsupervised clustering of test RP patient specimens into low-, medium-, and high-risk groups based on AI-estimated proportions of each Gleason pattern resulted in significantly improved RFS and MFS stratification compared with ISUP grading. In summary, deep learning-based quantitative Gleason scoring using a selective CutMix training strategy may improve prognostication after prostate cancer surgery.

Keywords