iScience (Apr 2024)

scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics

  • Yuchen Wang,
  • Xingjian Chen,
  • Zetian Zheng,
  • Lei Huang,
  • Weidun Xie,
  • Fuzhou Wang,
  • Zhaolei Zhang,
  • Ka-Chun Wong

Journal volume & issue
Vol. 27, no. 4
p. 109352

Abstract

Read online

Summary: Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.

Keywords