Advanced Science (May 2024)
scmFormer Integrates Large‐Scale Single‐Cell Proteomics and Transcriptomics Data by Multi‐Task Transformer
Abstract
Abstract Transformer‐based models have revolutionized single cell RNA‐seq (scRNA‐seq) data analysis. However, their applicability is challenged by the complexity and scale of single‐cell multi‐omics data. Here a novel single‐cell multi‐modal/multi‐task transformer (scmFormer) is proposed to fill up the existing blank of integrating single‐cell proteomics with other omics data. Through systematic benchmarking, it is demonstrated that scmFormer excels in integrating large‐scale single‐cell multimodal data and heterogeneous multi‐batch paired multi‐omics data, while preserving shared information across batchs and distinct biological information. scmFormer achieves 54.5% higher average F1 score compared to the second method in transferring cell‐type labels from single‐cell transcriptomics to proteomics data. Using COVID‐19 datasets, it is presented that scmFormer successfully integrates over 1.48 million cells on a personal computer. Moreover, it is also proved that scmFormer performs better than existing methods on generating the unmeasured modality and is well‐suited for spatial multi‐omic data. Thus, scmFormer is a powerful and comprehensive tool for analyzing single‐cell multi‐omics data.
Keywords