Genome Biology (Oct 2024)

Transipedia.org: k-mer-based exploration of large RNA sequencing datasets and application to cancer data

  • Chloé Bessière,
  • Haoliang Xue,
  • Benoit Guibert,
  • Anthony Boureux,
  • Florence Rufflé,
  • Julien Viot,
  • Rayan Chikhi,
  • Mikaël Salson,
  • Camille Marchet,
  • Thérèse Commes,
  • Daniel Gautheret

DOI
https://doi.org/10.1186/s13059-024-03413-5
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Indexing techniques relying on k-mers have proven effective in searching for RNA sequences across thousands of RNA-seq libraries, but without enabling direct RNA quantification. We show here that arbitrary RNA sequences can be quantified in seconds through their decomposition into k-mers, with a precision akin to that of conventional RNA quantification methods. Using an index of the Cancer Cell Line Encyclopedia (CCLE) collection consisting of 1019 RNA-seq samples, we show that k-mer indexing offers a powerful means to reveal non-reference sequences, and variant RNAs induced by specific gene alterations, for instance in splicing factors.

Keywords