npj Breast Cancer (Nov 2022)

Deep learning for fully-automated nuclear pleomorphism scoring in breast cancer

  • Caner Mercan,
  • Maschenka Balkenhol,
  • Roberto Salgado,
  • Mark Sherman,
  • Philippe Vielh,
  • Willem Vreuls,
  • António Polónia,
  • Hugo M. Horlings,
  • Wilko Weichert,
  • Jodi M. Carter,
  • Peter Bult,
  • Matthias Christgen,
  • Carsten Denkert,
  • Koen van de Vijver,
  • John-Melle Bokhorst,
  • Jeroen van der Laak,
  • Francesco Ciompi

DOI
https://doi.org/10.1038/s41523-022-00488-w
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 11

Abstract

Read online

Abstract To guide the choice of treatment, every new breast cancer is assessed for aggressiveness (i.e., graded) by an experienced histopathologist. Typically, this tumor grade consists of three components, one of which is the nuclear pleomorphism score (the extent of abnormalities in the overall appearance of tumor nuclei). The degree of nuclear pleomorphism is subjectively classified from 1 to 3, where a score of 1 most closely resembles epithelial cells of normal breast epithelium and 3 shows the greatest abnormalities. Establishing numerical criteria for grading nuclear pleomorphism is challenging, and inter-observer agreement is poor. Therefore, we studied the use of deep learning to develop fully automated nuclear pleomorphism scoring in breast cancer. The reference standard used for training the algorithm consisted of the collective knowledge of an international panel of 10 pathologists on a curated set of regions of interest covering the entire spectrum of tumor morphology in breast cancer. To fully exploit the information provided by the pathologists, a first-of-its-kind deep regression model was trained to yield a continuous scoring rather than limiting the pleomorphism scoring to the standard three-tiered system. Our approach preserves the continuum of nuclear pleomorphism without necessitating a large data set with explicit annotations of tumor nuclei. Once translated to the traditional system, our approach achieves top pathologist-level performance in multiple experiments on regions of interest and whole-slide images, compared to a panel of 10 and 4 pathologists, respectively.