Genome Biology (Mar 2019)

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

  • Avi Srivastava,
  • Laraib Malik,
  • Tom Smith,
  • Ian Sudbery,
  • Rob Patro

DOI
https://doi.org/10.1186/s13059-019-1670-y
Journal volume & issue
Vol. 20, no. 1
pp. 1 – 16

Abstract

Read online

Abstract We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

Keywords