Preprocessing of Public RNA-Sequencing Datasets to Facilitate Downstream Analyses of Human Diseases

Naomi Rapier-Sharman; John Krapohl; Ethan J. Beausoleil; Kennedy T. L. Gifford; Benjamin R. Hinatsu; Curtis S. Hoffmann; Makayla Komer; Tiana M. Scott; Brett E. Pickett

doi:10.3390/data6070075

Data (Jul 2021)

Preprocessing of Public RNA-Sequencing Datasets to Facilitate Downstream Analyses of Human Diseases

Naomi Rapier-Sharman,
John Krapohl,
Ethan J. Beausoleil,
Kennedy T. L. Gifford,
Benjamin R. Hinatsu,
Curtis S. Hoffmann,
Makayla Komer,
Tiana M. Scott,
Brett E. Pickett

Affiliations

Naomi Rapier-Sharman: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
John Krapohl: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Ethan J. Beausoleil: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Kennedy T. L. Gifford: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Benjamin R. Hinatsu: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Curtis S. Hoffmann: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Makayla Komer: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Tiana M. Scott: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
Brett E. Pickett: Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA

DOI: https://doi.org/10.3390/data6070075
Journal volume & issue: Vol. 6, no. 7
p. 75

Abstract

Read online

Publicly available RNA-sequencing (RNA-seq) data are a rich resource for elucidating the mechanisms of human disease; however, preprocessing these data requires considerable bioinformatic expertise and computational infrastructure. Analyzing multiple datasets with a consistent computational workflow increases the accuracy of downstream meta-analyses. This collection of datasets represents the human intracellular transcriptional response to disorders and diseases such as acute lymphoblastic leukemia (ALL), B-cell lymphomas, chronic obstructive pulmonary disease (COPD), colorectal cancer, lupus erythematosus; as well as infection with pathogens including Borrelia burgdorferi, hantavirus, influenza A virus, Middle East respiratory syndrome coronavirus (MERS-CoV), Streptococcus pneumoniae, respiratory syncytial virus (RSV), severe acute respiratory syndrome coronavirus (SARS-CoV), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We calculated the statistically significant differentially expressed genes and Gene Ontology terms for all datasets. In addition, a subset of the datasets also includes results from splice variant analyses, intracellular signaling pathway enrichments as well as read mapping and quantification. All analyses were performed using well-established algorithms and are provided to facilitate future data mining activities, wet lab studies, and to accelerate collaboration and discovery.

Published in Data

ISSN: 2306-5729 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Bibliography. Library science. Information resources
Website: http://www.mdpi.com/journal/data

About the journal

Abstract

Keywords