Patterns (Jun 2021)
ACSNI: An unsupervised machine-learning tool for prediction of tissue-specific pathway components using gene expression profiles
Abstract
Summary: Determining the tissue- and disease-specific circuit of biological pathways remains a fundamental goal of molecular biology. Many components of these biological pathways still remain unknown, hindering the full and accurate characterization of biological processes of interest. Here we describe ACSNI, an algorithm that combines prior knowledge of biological processes with a deep neural network to effectively decompose gene expression profiles (GEPs) into multi-variable pathway activities and identify unknown pathway components. Experiments on public GEP data show that ACSNI predicts cogent components of mTOR, ATF2, and HOTAIRM1 signaling that recapitulate regulatory information from genetic perturbation and transcription factor binding datasets. Our framework provides a fast and easy-to-use method to identify components of signaling pathways as a tool for molecular mechanism discovery and to prioritize genes for designing future targeted experiments (https://github.com/caanene1/ACSNI). The bigger picture: Methods that group genes into functional units to quantify pathway activities are critical in the analysis of biological systems. Although many components of biological pathways have been described in detail, these tend to be limited to well-studied genes. In contrast, the majority of possible components remain unexplored. Here, we present a machine-learning tool for constructing and predicting tissue-specific components of biological pathways from large biological datasets. Our algorithm, ACSNI, can tackle incomplete pathway descriptions and enhance current pathway analysis methods' performance. We anticipate that, by dissecting the complex signals in biological data in a flexible and context-specific manner, ACSNI can facilitate the full characterization of physiological systems of interest.