Computational problems of analysis of short next generation sequencing reads

R. te Boekhorst; F. M. Naumenko; N. G. Orlova; E. R. Galieva; A. M. Spitsina; I. V. Chadaeva; Y. L. Orlov; I. I. Abnizova

doi:10.18699/VJ16.191

Вавиловский журнал генетики и селекции (Feb 2017)

Computational problems of analysis of short next generation sequencing reads

R. te Boekhorst,
F. M. Naumenko,
N. G. Orlova,
E. R. Galieva,
A. M. Spitsina,
I. V. Chadaeva,
Y. L. Orlov,
I. I. Abnizova

Affiliations

R. te Boekhorst: University of Hertfordshire
F. M. Naumenko: Novosibirsk State University
N. G. Orlova: Novosibirsk State University Novosibirsk State University of Architecture and Civil Engineering (Sibstrin)
E. R. Galieva: Novosibirsk State University Institute of Cytology and Genetics SB RAS
A. M. Spitsina: Novosibirsk State University
I. V. Chadaeva: Novosibirsk State University
Y. L. Orlov: Novosibirsk State University Institute of Cytology and Genetics SB RAS
I. I. Abnizova: Wellcome Trust Sanger Institute

DOI: https://doi.org/10.18699/VJ16.191
Journal volume & issue: Vol. 20, no. 6
pp. 746 – 755

Abstract

Read online

Short read next generation sequencing (NGS) has significant impacts on modern genomics, genetics, cell biology and medicine, especially on meta-genomics, comparative genomics, polymorphism detection, mutation screening, transcriptome profiling, methylation profiling, chromatin remodelling and many more applications. However, NGS are prone for errors which complicate scientific conclusions. NGS technologies consist of shearing DNA molecules into collection of numerous small fragments, called a ‘library’, and their further extensive parallel sequencing. These sequenced overlapping fragments are called ‘reads’, they are assembled into contiguous strings. The contiguous sequences are in turn assembled into genomes for further analysis. Computational sequencing problems are those arising from numerical processing of sequenced samples. The numerical processing involves procedures such as: quality-scoring, mapping/assembling, and surprisingly, error-correction of a data. This paper is reviewing post-processing errors and computational methods to discern them. It also includes sequencing dictionary. We present here quality control of raw data, errors arising at the steps of alignment of sequencing reads to a reference genome and assembly. Finally this work presents identification of mutations (“Variant calling”) in sequencing data and its quality control.

Published in Вавиловский журнал генетики и селекции

ISSN: 2500-0462 (Print); 2500-3259 (Online)
Publisher: Siberian Branch of the Russian Academy of Sciences, Federal Research Center Institute of Cytology and Genetics, The Vavilov Society of Geneticists and Breeders
Country of publisher: Russian Federation
LCC subjects: Science: Biology (General): Genetics
Website: http://vavilov.elpub.ru/

About the journal

Abstract

Keywords