Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Avi Srivastava; Laraib Malik; Tom Smith; Ian Sudbery; Rob Patro

doi:10.1186/s13059-019-1670-y

Genome Biology (Mar 2019)

Alevin efficiently estimates accurate gene abundances from dscRNA-seq data

Avi Srivastava,
Laraib Malik,
Tom Smith,
Ian Sudbery,
Rob Patro

Affiliations

Avi Srivastava: Department of Computer Science, Stony Brook University
Laraib Malik: Department of Computer Science, Stony Brook University
Tom Smith: Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge
Ian Sudbery: Sheffield Institute for Nucleic Acids, Department of Molecular Biology and Biotechnology, The University of Sheffield
Rob Patro: Department of Computer Science, Stony Brook University

DOI: https://doi.org/10.1186/s13059-019-1670-y
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 16

Abstract

Read online

Abstract We introduce alevin, a fast end-to-end pipeline to process droplet-based single-cell RNA sequencing data, performing cell barcode detection, read mapping, unique molecular identifier (UMI) deduplication, gene count estimation, and cell barcode whitelisting. Alevin’s approach to UMI deduplication considers transcript-level constraints on the molecules from which UMIs may have arisen and accounts for both gene-unique reads and reads that multimap between genes. This addresses the inherent bias in existing tools which discard gene-ambiguous reads and improves the accuracy of gene abundance estimates. Alevin is considerably faster, typically eight times, than existing gene quantification approaches, while also using less memory.

Published in Genome Biology

ISSN: 1474-760X (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Science: Biology (General): Genetics
Website: https://genomebiology.biomedcentral.com/

About the journal

Abstract

Keywords