Genes (Jan 2019)

A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads

  • Wenjing Zhang,
  • Neng Huang,
  • Jiantao Zheng,
  • Xingyu Liao,
  • Jianxin Wang,
  • Hong-Dong Li

DOI
https://doi.org/10.3390/genes10010044
Journal volume & issue
Vol. 10, no. 1
p. 44

Abstract

Read online

The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms.

Keywords