iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.

Anand V Sastry; Yuan Yuan; Saugat Poudel; Kevin Rychel; Reo Yoo; Cameron R Lamoureux; Gaoyuan Li; Joshua T Burrows; Siddharth Chauhan; Zachary B Haiman; Tahani Al Bulushi; Yara Seif; Bernhard O Palsson; Daniel C Zielinski

doi:10.1371/journal.pcbi.1012546

PLoS Computational Biology (Oct 2024)

iModulonMiner and PyModulon: Software for unsupervised mining of gene expression compendia.

Anand V Sastry,
Yuan Yuan,
Saugat Poudel,
Kevin Rychel,
Reo Yoo,
Cameron R Lamoureux,
Gaoyuan Li,
Joshua T Burrows,
Siddharth Chauhan,
Zachary B Haiman,
Tahani Al Bulushi,
Yara Seif,
Bernhard O Palsson,
Daniel C Zielinski

Affiliations

Anand V Sastry
Yuan Yuan
Saugat Poudel
Kevin Rychel
Reo Yoo
Cameron R Lamoureux
Gaoyuan Li
Joshua T Burrows
Siddharth Chauhan
Zachary B Haiman
Tahani Al Bulushi
Yara Seif
Bernhard O Palsson
Daniel C Zielinski

DOI: https://doi.org/10.1371/journal.pcbi.1012546
Journal volume & issue: Vol. 20, no. 10
p. e1012546

Abstract

Read online

Public gene expression databases are a rapidly expanding resource of organism responses to diverse perturbations, presenting both an opportunity and a challenge for bioinformatics workflows to extract actionable knowledge of transcription regulatory network function. Here, we introduce a five-step computational pipeline, called iModulonMiner, to compile, process, curate, analyze, and characterize the totality of RNA-seq data for a given organism or cell type. This workflow is centered around the data-driven computation of co-regulated gene sets using Independent Component Analysis, called iModulons, which have been shown to have broad applications. As a demonstration, we applied this workflow to generate the iModulon structure of Bacillus subtilis using all high-quality, publicly-available RNA-seq data. Using this structure, we predicted regulatory interactions for multiple transcription factors, identified groups of co-expressed genes that are putatively regulated by undiscovered transcription factors, and predicted properties of a recently discovered single-subunit phage RNA polymerase. We also present a Python package, PyModulon, with functions to characterize, visualize, and explore computed iModulons. The pipeline, available at https://github.com/SBRG/iModulonMiner, can be readily applied to diverse organisms to gain a rapid understanding of their transcriptional regulatory network structure and condition-specific activity.

Published in PLoS Computational Biology

ISSN: 1553-734X (Print); 1553-7358 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/ploscompbiol/

About the journal