Communications Biology (Oct 2024)

Imputing spatial transcriptomics through gene network constructed from protein language model

  • Yuansong Zeng,
  • Yujie Song,
  • Chengyang Zhang,
  • Haoxuan Li,
  • Yongkang Zhao,
  • Weijiang Yu,
  • Shiqi Zhang,
  • Hongyu Zhang,
  • Zhiming Dai,
  • Yuedong Yang

DOI
https://doi.org/10.1038/s42003-024-06964-2
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Image-based spatial transcriptomic sequencing technologies have enabled the measurement of gene expression at single-cell resolution, but with a limited number of genes. Current computational approaches attempt to overcome these limitations by imputing missing genes, but face challenges regarding prediction accuracy and identification of cell populations due to the neglect of gene-gene relationships. In this context, we present stImpute, a method to impute spatial transcriptomics according to reference scRNA-seq data based on the gene network constructed from the protein language model ESM-2. Specifically, stImpute employs an autoencoder to create gene expression embeddings for both spatial transcriptomics and scRNA-seq data, which are used to identify the nearest neighboring cells between scRNA-seq and spatial transcriptomics datasets. According to the neighbored cells, the gene expressions of spatial transcriptomics cells are imputed through a graph neural network, where nodes are genes, and edges are based on cosine similarity between the ESM-2 embeddings of the gene-encoding proteins. The gene prediction uncertainty is further measured through a deep learning model. stImpute was shown to consistently outperform state-of-the-art methods across multiple datasets concerning imputation and clustering. stImpute also demonstrates robustness in producing consistent results that are insensitive to model parameters.