Scientific Reports (Jan 2024)

Predicting cell types with supervised contrastive learning on cells and their types

  • Yusri Dwi Heryanto,
  • Yao-zhong Zhang,
  • Seiya Imoto

DOI
https://doi.org/10.1038/s41598-023-50185-2
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Single-cell RNA-sequencing (scRNA-seq) is a powerful technique that provides high-resolution expression profiling of individual cells. It significantly advances our understanding of cellular diversity and function. Despite its potential, the analysis of scRNA-seq data poses considerable challenges related to multicollinearity, data imbalance, and batch effect. One of the pivotal tasks in single-cell data analysis is cell type annotation, which classifies cells into discrete types based on their gene expression profiles. In this work, we propose a novel modeling formalism for cell type annotation with a supervised contrastive learning method, named SCLSC (Supervised Contrastive Learning for Single Cell). Different from the previous usage of contrastive learning in single cell data analysis, we employed the contrastive learning for instance-type pairs instead of instance-instance pairs. More specifically, in the cell type annotation task, the contrastive learning is applied to learn cell and cell type representation that render cells of the same type to be clustered in the new embedding space. Through this approach, the knowledge derived from annotated cells is transferred to the feature representation for scRNA-seq data. The whole training process becomes more efficient when conducting contrastive learning for cell and their types. Our experiment results demonstrate that the proposed SCLSC method consistently achieves superior accuracy in predicting cell types compared to five state-of-the-art methods. SCLSC also performs well in identifying cell types in different batch groups. The simplicity of our method allows for scalability, making it suitable for analyzing datasets with a large number of cells. In a real-world application of SCLSC to monitor the dynamics of immune cell subpopulations over time, SCLSC demonstrates a capability to discriminate cell subtypes of CD19+ B cells that were not present in the training dataset.