International Journal of Molecular Sciences (Feb 2022)

Better Agreement of Human Transcriptomic and Proteomic Cancer Expression Data at the Molecular Pathway Activation Level

  • Mikhail Raevskiy,
  • Maxim Sorokin,
  • Galina Zakharova,
  • Victor Tkachev,
  • Nicolas Borisov,
  • Denis Kuzmin,
  • Kristina Kremenchutckaya,
  • Alexander Gudkov,
  • Dmitry Kamashev,
  • Anton Buzdin

DOI
https://doi.org/10.3390/ijms23052611
Journal volume & issue
Vol. 23, no. 5
p. 2611

Abstract

Read online

Previously, we have shown that the aggregation of RNA-level gene expression profiles into quantitative molecular pathway activation metrics results in lesser batch effects and better agreement between different experimental platforms. Here, we investigate whether pathway level of data analysis provides any advantage when comparing transcriptomic and proteomic data. We compare the paired proteomic and transcriptomic gene expression and pathway activation profiles obtained for the same human cancer biosamples in The Cancer Genome Atlas (TCGA) and the NCI Clinical Proteomic Tumor Analysis Consortium (CPTAC) projects, for a total of 755 samples of glioblastoma, breast, liver, lung, ovarian, pancreatic, and uterine cancers. In a CPTAC assay, expression levels of 15,112 protein-coding genes were profiled using the Thermo QE series of mass spectrometers. In TCGA, RNA expression levels of the same genes were obtained using the Illumina HiSeq 4000 engine for the same biosamples. At the gene level, absolute gene expression values are compared, whereas pathway-grade comparisons are made between the pathway activation levels (PALs) calculated using average sample-normalized transcriptomic and proteomic profiles. We observed remarkably different average correlations between the primary RNA- and protein expression data for different cancer types: Spearman Rho between 0.017 (p = 1.7 × 10−13) and 0.27 (p −16). However, at the pathway level we detected overall statistically significantly higher correlations: averaged Rho between 0.022 (p −16) and 0.56 (p −16). Thus, we conclude that data analysis at the PAL-level yields results of a greater similarity when comparing high-throughput RNA and protein expression profiles.

Keywords