Generative pretraining from large-scale transcriptomes for single-cell deciphering
Hongru Shen,
Jilei Liu,
Jiani Hu,
Xilin Shen,
Chao Zhang,
Dan Wu,
Mengyao Feng,
Meng Yang,
Yang Li,
Yichen Yang,
Wei Wang,
Qiang Zhang,
Jilong Yang,
Kexin Chen,
Xiangchun Li
Affiliations
Hongru Shen
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Jilei Liu
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Jiani Hu
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Xilin Shen
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Chao Zhang
Department of Bone and Soft Tissue Tumor, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Dan Wu
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Mengyao Feng
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Meng Yang
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Yang Li
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Yichen Yang
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Wei Wang
Department of Epidemiology and Biostatistics, Tianjin’s Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Qiang Zhang
Department of Maxillofacial and Otorhinolaryngology Oncology, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Jilong Yang
Department of Bone and Soft Tissue Tumor, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China
Kexin Chen
Department of Epidemiology and Biostatistics, Tianjin’s Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China; Corresponding author
Xiangchun Li
Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin, China; Corresponding author
Summary: Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretraining from transcriptomes (tGPT) for learning feature representation of transcriptomes. tGPT is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors. We developed tGPT with 22.3 million single-cell transcriptomes and used four single-cell datasets to evalutate its performance on single-cell analysis tasks. In addition, we examine its applications on bulk tissues. The single-cell clusters and cell lineage trajectories derived from tGPT are highly aligned with known cell labels and states. The feature patterns of tumor bulk tissues learned by tGPT are associated with a wide range of genomic alteration events, prognosis, and treatment outcome of immunotherapy. tGPT represents a new analytical paradigm for integrating and deciphering massive amounts of transcriptome data and it will facilitate the interpretation and clinical translation of single-cell transcriptomes.