scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics

Yuchen Wang; Xingjian Chen; Zetian Zheng; Lei Huang; Weidun Xie; Fuzhou Wang; Zhaolei Zhang; Ka-Chun Wong

iScience (Apr 2024)

scGREAT: Transformer-based deep-language model for gene regulatory network inference from single-cell transcriptomics

Yuchen Wang,
Xingjian Chen,
Zetian Zheng,
Lei Huang,
Weidun Xie,
Fuzhou Wang,
Zhaolei Zhang,
Ka-Chun Wong

Affiliations

Yuchen Wang: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Xingjian Chen: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR; Cutaneous Biology Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Zetian Zheng: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Lei Huang: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Weidun Xie: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Fuzhou Wang: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR
Zhaolei Zhang: Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada; Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, ON, Canada; Department of Computer Science, University of Toronto, Toronto, ON, Canada
Ka-Chun Wong: Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong SAR; Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China; Corresponding author

Journal volume & issue: Vol. 27, no. 4
p. 109352

Abstract

Read online

Summary: Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords