UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

Maria Tsagiopoulou; Maria Christina Maniou; Nikolaos Pechlivanis; Nikolaos Pechlivanis; Anastasis Togkousidis; Michaela Kotrová; Tobias Hutzenlaub; Tobias Hutzenlaub; Ilias Kappas; Anastasia Chatzidimitriou; Fotis Psomopoulos

doi:10.3389/fgene.2021.660366

Frontiers in Genetics (May 2021)

UMIc: A Preprocessing Method for UMI Deduplication and Reads Correction

Maria Tsagiopoulou,
Maria Christina Maniou,
Nikolaos Pechlivanis,
Nikolaos Pechlivanis,
Anastasis Togkousidis,
Michaela Kotrová,
Tobias Hutzenlaub,
Tobias Hutzenlaub,
Ilias Kappas,
Anastasia Chatzidimitriou,
Fotis Psomopoulos

Affiliations

Maria Tsagiopoulou: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
Maria Christina Maniou: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
Nikolaos Pechlivanis: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
Nikolaos Pechlivanis: Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
Anastasis Togkousidis: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
Michaela Kotrová: Unit for Hematological Diagnostics, Department of Internal Medicine II, University Medical Center Schleswig-Holstein, Kiel, Germany
Tobias Hutzenlaub: Laboratory for MEMS Applications, IMTEK-Department of Microsystems Engineering, University of Freiburg, Freiburg, Germany
Tobias Hutzenlaub: Hahn-Schickard, Freiburg, Germany
Ilias Kappas: Department of Genetics, Development and Molecular Biology, School of Biology, Aristotle University of Thessaloniki, Thessaloniki, Greece
Anastasia Chatzidimitriou: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece
Fotis Psomopoulos: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece

DOI: https://doi.org/10.3389/fgene.2021.660366
Journal volume & issue: Vol. 12

Abstract

Read online

A recent refinement in high-throughput sequencing involves the incorporation of unique molecular identifiers (UMIs), which are random oligonucleotide barcodes, on the library preparation steps. A UMI adds a unique identity to different DNA/RNA input molecules through polymerase chain reaction (PCR) amplification, thus reducing bias of this step. Here, we propose an alignment free framework serving as a preprocessing step of fastq files, called UMIc, for deduplication and correction of reads building consensus sequences from each UMI. Our approach takes into account the frequency and the Phred quality of nucleotides and the distances between the UMIs and the actual sequences. We have tested the tool using different scenarios of UMI-tagged library data, having in mind the aspect of a wide application. UMIc is an open-source tool implemented in R and is freely available from https://github.com/BiodataAnalysisGroup/UMIc.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords