Self-supervised deep learning for highly efficient spatial immunophenotypingResearch in context
Hanyun Zhang,
Khalid AbdulJabbar,
Tami Grunewald,
Ayse U. Akarca,
Yeman Hagos,
Faranak Sobhani,
Catherine S.Y. Lecat,
Dominic Patel,
Lydia Lee,
Manuel Rodriguez-Justo,
Kwee Yong,
Jonathan A. Ledermann,
John Le Quesne,
E. Shelley Hwang,
Teresa Marafioti,
Yinyin Yuan
Affiliations
Hanyun Zhang
Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK; Division of Molecular Pathology, The Institute of Cancer Research, London, UK
Khalid AbdulJabbar
Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK; Division of Molecular Pathology, The Institute of Cancer Research, London, UK
Tami Grunewald
Department of Oncology, UCL Cancer Institute, University College London, London, UK
Ayse U. Akarca
Department of Cellular Pathology, University College London Hospital, London, UK
Yeman Hagos
Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK; Division of Molecular Pathology, The Institute of Cancer Research, London, UK
Faranak Sobhani
Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK; Division of Molecular Pathology, The Institute of Cancer Research, London, UK
Catherine S.Y. Lecat
Research Department of Hematology, Cancer Institute, University College London, UK
Dominic Patel
Research Department of Hematology, Cancer Institute, University College London, UK
Lydia Lee
Research Department of Hematology, Cancer Institute, University College London, UK
Manuel Rodriguez-Justo
Research Department of Hematology, Cancer Institute, University College London, UK
Kwee Yong
Research Department of Hematology, Cancer Institute, University College London, UK
Jonathan A. Ledermann
Department of Oncology, UCL Cancer Institute, University College London, London, UK
John Le Quesne
School of Cancer Sciences, University of Glasgow, Glasgow, UK; CRUK Beatson Institute, Garscube Estate, Glasgow, UK; Department of Histopathology, Queen Elizabeth University Hospital, Glasgow, UK
E. Shelley Hwang
Department of Surgery, Duke University Medical Center, Durham, NC, USA
Teresa Marafioti
Department of Cellular Pathology, University College London Hospital, London, UK
Yinyin Yuan
Centre for Evolution and Cancer, The Institute of Cancer Research, London, UK; Division of Molecular Pathology, The Institute of Cancer Research, London, UK; Corresponding author. Division of Molecular Pathology, The Institute of Cancer Research, London, UK.
Summary: Background: Efficient biomarker discovery and clinical translation depend on the fast and accurate analytical output from crucial technologies such as multiplex imaging. However, reliable cell classification often requires extensive annotations. Label-efficient strategies are urgently needed to reveal diverse cell distribution and spatial interactions in large-scale multiplex datasets. Methods: This study proposed Self-supervised Learning for Antigen Detection (SANDI) for accurate cell phenotyping while mitigating the annotation burden. The model first learns intrinsic pairwise similarities in unlabelled cell images, followed by a classification step to map learnt features to cell labels using a small set of annotated references. We acquired four multiplex immunohistochemistry datasets and one imaging mass cytometry dataset, comprising 2825 to 15,258 single-cell images to train and test the model. Findings: With 1% annotations (18–114 cells), SANDI achieved weighted F1-scores ranging from 0.82 to 0.98 across the five datasets, which was comparable to the fully supervised classifier trained on 1828–11,459 annotated cells (−0.002 to −0.053 of averaged weighted F1-score, Wilcoxon rank-sum test, P = 0.31). Leveraging the immune checkpoint markers stained in ovarian cancer slides, SANDI-based cell identification reveals spatial expulsion between PD1-expressing T helper cells and T regulatory cells, suggesting an interplay between PD1 expression and T regulatory cell-mediated immunosuppression. Interpretation: By striking a fine balance between minimal expert guidance and the power of deep learning to learn similarity within abundant data, SANDI presents new opportunities for efficient, large-scale learning for histology multiplex imaging data. Funding: This study was funded by the Royal Marsden/ICR National Institute of Health Research Biomedical Research Centre.