A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks

Hatice Büşra Lüleci; Dilara Uzuner; Müberra Fatma Cesur; Atılay İlgün; Elif Düz; Ecehan Abdik; Regan Odongo; Tunahan Çakır

doi:10.1038/s41540-024-00448-z

npj Systems Biology and Applications (Oct 2024)

A benchmark of RNA-seq data normalization methods for transcriptome mapping on human genome-scale metabolic networks

Hatice Büşra Lüleci,
Dilara Uzuner,
Müberra Fatma Cesur,
Atılay İlgün,
Elif Düz,
Ecehan Abdik,
Regan Odongo,
Tunahan Çakır

Affiliations

Hatice Büşra Lüleci: Department of Bioengineering, Gebze Technical University
Dilara Uzuner: Department of Bioengineering, Gebze Technical University
Müberra Fatma Cesur: Department of Bioengineering, Gebze Technical University
Atılay İlgün: Department of Bioengineering, Gebze Technical University
Elif Düz: Department of Bioengineering, Gebze Technical University
Ecehan Abdik: Department of Bioengineering, Gebze Technical University
Regan Odongo: Department of Bioengineering, Gebze Technical University
Tunahan Çakır: Department of Bioengineering, Gebze Technical University

DOI: https://doi.org/10.1038/s41540-024-00448-z
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Genome-scale metabolic models (GEMs) cover the entire list of metabolic genes in an organism and associated reactions, in a tissue/condition non-specific manner. RNA-seq provides crucial information to make the GEMs condition-specific. Integrative Metabolic Analysis Tool (iMAT) and Integrative Network Inference for Tissues (INIT) are the two most popular algorithms to create condition-specific GEMs from human transcriptome data. The normalization method of choice for raw RNA-seq count data affects the model content produced by these algorithms and their predictive accuracy. However, a benchmark of the RNA-seq normalization methods on the performance of iMAT and INIT algorithms is missing in the literature. Another important phenomenon is covariates such as age and gender in a dataset, and they can affect the predictivity of analysis. In this study, we aimed to compare five different RNA-seq data normalization methods (TPM, FPKM, TMM, GeTMM, and RLE) and covariate adjusted versions of the normalized data by mapping them on a human GEM using the iMAT and INIT algorithms to generate personalized metabolic models. We used RNA-seq data for Alzheimer’s disease (AD) and lung adenocarcinoma (LUAD) patients. The results demonstrated that RNA-seq data normalized by the RLE, TMM, or GeTMM methods enabled the production of condition-specific metabolic models with considerably low variability in terms of the number of active reactions compared to the within-sample normalization methods (FPKM, TPM). Using these models, we could more accurately capture the disease-associated genes (average accuracy of ~0.80 for AD and ~0.67 for LUAD) for the RLE, TMM, and GeTMM normalization methods. An increase in the accuracies was observed for all the methods when covariate adjustment was applied. We found a similar accuracy trend when we compared the metabolites of perturbed reactions to metabolome data for AD. Together, our benchmark study shows that the between-sample RNA-seq normalization methods reduce false positive predictions at the expense of missing some true positive genes when mapped on GEMs.

Published in npj Systems Biology and Applications

ISSN: 2056-7189 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General)
Website: https://www.nature.com/npjsba/

About the journal