Heliyon (Jul 2024)

Deep learning for automated scoring of immunohistochemically stained tumour tissue sections – Validation across tumour types based on patient outcomes

  • Wanja Kildal,
  • Karolina Cyll,
  • Joakim Kalsnes,
  • Rakibul Islam,
  • Frida M. Julbø,
  • Manohar Pradhan,
  • Elin Ersvær,
  • Neil Shepherd,
  • Ljiljana Vlatkovic,
  • Xavier Tekpli,
  • Øystein Garred,
  • Gunnar B. Kristensen,
  • Hanne A. Askautrud,
  • Tarjei S. Hveem,
  • Håvard E. Danielsen,
  • Tone F. Bathen,
  • Elin Borgen,
  • Anne-Lise Børresen-Dale,
  • Olav Engebråten,
  • Britt Fritzman,
  • Olaf Johan Hartman-Johnsen,
  • Øystein Garred,
  • Jürgen Geisler,
  • Gry Aarum Geitvik,
  • Solveig Hofvind,
  • Rolf Kåresen,
  • Anita Langerød,
  • Ole Christian Lingjærde,
  • Gunhild M. Mælandsmo,
  • Bjørn Naume,
  • Hege G. Russnes,
  • Kristine Kleivi Sahlberg,
  • Torill Sauer,
  • Helle Kristine Skjerven,
  • Ellen Schlichting,
  • Therese Sørlie

Journal volume & issue
Vol. 10, no. 13
p. e32529

Abstract

Read online

We aimed to develop deep learning (DL) models to detect protein expression in immunohistochemically (IHC) stained tissue-sections, and to compare their accuracy and performance with manually scored clinically relevant proteins in common cancer types.Five cancer patient cohorts (colon, two prostate, breast, and endometrial) were included. We developed separate DL models for scoring IHC-stained tissue-sections with nuclear, cytoplasmic, and membranous staining patterns. For training, we used images with annotations of cells with positive and negative staining from the colon cohort stained for Ki-67 and PMS2 (nuclear model), the prostate cohort 1 stained for PTEN (cytoplasmic model) and β-catenin (membranous model). The nuclear DL model was validated for MSH6 in the colon, MSH6 and PMS2 in the endometrium, Ki-67 and CyclinB1 in prostate, and oestrogen and progesterone receptors in the breast cancer cohorts. The cytoplasmic DL model was validated for PTEN and Mapre2, and the membranous DL model for CD44 and Flotillin1, all in prostate cohorts. When comparing the results of manual and DL scores in the validation sets, using manual scores as the ground truth, we observed an average correct classification rate of 91.5 % (76.9–98.5 %) for the nuclear model, 85.6 % (73.3–96.6 %) for the cytoplasmic model, and 78.4 % (75.5–84.3 %) for the membranous model. In survival analyses, manual and DL scores showed similar prognostic impact, with similar hazard ratios and p-values for all DL models. Our findings demonstrate that DL models offer a promising alternative to manual IHC scoring, providing efficiency and reproducibility across various data sources and markers.

Keywords