BMC Medical Genomics (Mar 2021)

Determining mutational burden and signature using RNA-seq from tumor-only samples

  • Erik Jessen,
  • Yuanhang Liu,
  • Jaime Davila,
  • Jean-Pierre Kocher,
  • Chen Wang

DOI
https://doi.org/10.1186/s12920-021-00898-y
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Traditionally, mutational burden and mutational signatures have been assessed by tumor-normal pair DNA sequencing. The requirement of having both normal and tumor samples is not always feasible from a clinical perspective, and led us to investigate the efficacy of using RNA sequencing of only the tumor sample to determine the mutational burden and signatures, and subsequently molecular cause of the cancer. The potential advantages include reducing the cost of testing, and simultaneously providing information on the gene expression profile and gene fusions present in the tumor. Results In this study, we devised supervised and unsupervised learning methods to determine mutational signatures from tumor RNA-seq data. As applications, we applied the methods to a training set of 587 TCGA uterine cancer RNA-seq samples, and examined in an independent testing set of 521 TCGA colorectal cancer RNA-seq samples. Both diseases are known associated with microsatellite instable high (MSI-H) and driver defects in DNA polymerase ɛ (POLɛ). From RNA-seq called variants, we found majority (> 95%) are likely germline variants, leading to C > T enriched germline variants (dbSNP) widely applicable in tumor and normal RNA-seq samples. We found significant associations between RNA-derived mutational burdens and MSI/POLɛ status, and insignificant relationship between RNA-seq total coverage and derived mutational burdens. Additionally we found that over 80% of variants could be explained by using the COSMIC mutational signature-5, -6 and -10, which are implicated in natural aging, MSI-H, and POLɛ, respectively. For classifying tumor type, within UCEC we achieved a recall of 0.56 and 0.78, and specificity of 0.66 and 0.99 for MSI and POLɛ respectively. By applying learnt RNA signatures from UCEC to COAD, we were able to improve our classification of both MSI and POLɛ. Conclusions Taken together, our work provides a novel method to detect RNA-seq derived mutational signatures with effective procedures to remove likely germline variants. It can leads to accurate classification of underlying driving mechanisms of DNA damage deficiency.

Keywords