BMC Bioinformatics (Aug 2020)

A sparse Bayesian factor model for the construction of gene co-expression networks from single-cell RNA sequencing count data

  • Michael Sekula,
  • Jeremy Gaskins,
  • Susmita Datta

DOI
https://doi.org/10.1186/s12859-020-03707-y
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Gene co-expression networks (GCNs) are powerful tools that enable biologists to examine associations between genes during different biological processes. With the advancement of new technologies, such as single-cell RNA sequencing (scRNA-seq), there is a need for developing novel network methods appropriate for new types of data. Results We present a novel sparse Bayesian factor model to explore the network structure associated with genes in scRNA-seq data. Latent factors impact the gene expression values for each cell and provide flexibility to account for common features of scRNA-seq: high proportions of zero values, increased cell-to-cell variability, and overdispersion due to abnormally large expression counts. From our model, we construct a GCN by analyzing the positive and negative associations of the factors that are shared between each pair of genes. Conclusions Simulation studies demonstrate that our methodology has high power in identifying gene-gene associations while maintaining a nominal false discovery rate. In real data analyses, our model identifies more known and predicted protein-protein interactions than other competing network models.

Keywords