iScience (May 2023)

Generative pretraining from large-scale transcriptomes for single-cell deciphering

  • Hongru Shen,
  • Jilei Liu,
  • Jiani Hu,
  • Xilin Shen,
  • Chao Zhang,
  • Dan Wu,
  • Mengyao Feng,
  • Meng Yang,
  • Yang Li,
  • Yichen Yang,
  • Wei Wang,
  • Qiang Zhang,
  • Jilong Yang,
  • Kexin Chen,
  • Xiangchun Li

Journal volume & issue
Vol. 26, no. 5
p. 106536

Abstract

Read online

Summary: Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretraining from transcriptomes (tGPT) for learning feature representation of transcriptomes. tGPT is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors. We developed tGPT with 22.3 million single-cell transcriptomes and used four single-cell datasets to evalutate its performance on single-cell analysis tasks. In addition, we examine its applications on bulk tissues. The single-cell clusters and cell lineage trajectories derived from tGPT are highly aligned with known cell labels and states. The feature patterns of tumor bulk tissues learned by tGPT are associated with a wide range of genomic alteration events, prognosis, and treatment outcome of immunotherapy. tGPT represents a new analytical paradigm for integrating and deciphering massive amounts of transcriptome data and it will facilitate the interpretation and clinical translation of single-cell transcriptomes.

Keywords