PLoS Genetics (Mar 2024)

INSIDER: Interpretable sparse matrix decomposition for RNA expression data analysis.

  • Kai Zhao,
  • Sen Huang,
  • Cuichan Lin,
  • Pak Chung Sham,
  • Hon-Cheong So,
  • Zhixiang Lin

DOI
https://doi.org/10.1371/journal.pgen.1011189
Journal volume & issue
Vol. 20, no. 3
p. e1011189

Abstract

Read online

RNA sequencing (RNA-Seq) is widely used to capture transcriptome dynamics across tissues, biological entities, and conditions. Currently, few or no methods can handle multiple biological variables (e.g., tissues/ phenotypes) and their interactions simultaneously, while also achieving dimension reduction (DR). We propose INSIDER, a general and flexible statistical framework based on matrix factorization, which is freely available at https://github.com/kai0511/insider. INSIDER decomposes variation from different biological variables and their interactions into a shared low-rank latent space. Particularly, it introduces the elastic net penalty to induce sparsity while considering the grouping effects of genes. It can achieve DR of high-dimensional data (of > = 3 dimensions), as opposed to conventional methods (e.g., PCA/NMF) which generally only handle 2D data (e.g., sample × expression). Besides, it enables computing 'adjusted' expression profiles for specific biological variables while controlling variation from other variables. INSIDER is computationally efficient and accommodates missing data. INSIDER also performed similarly or outperformed a close competing method, SDA, as shown in simulations and can handle complex missing data in RNA-Seq data. Moreover, unlike SDA, it can be used when the data cannot be structured into a tensor. Lastly, we demonstrate its usefulness via real data analysis, including clustering donors for disease subtyping, revealing neuro-development trajectory using the BrainSpan data, and uncovering biological processes contributing to variables of interest (e.g., disease status and tissue) and their interactions.