Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection

Christophe Lambert; Cassandra Braxton; Robert  L. Charlebois; Avisek Deyati; Paul Duncan; Fabio La Neve; Heather  D. Malicki; Sebastien Ribrioux; Daniel  K. Rozelle; Brandye Michaels; Wenping Sun; Zhihui Yang; Arifa  S. Khan

doi:10.3390/v10100528

Viruses (Sep 2018)

Considerations for Optimization of High-Throughput Sequencing Bioinformatics Pipelines for Virus Detection

Christophe Lambert,
Cassandra Braxton,
Robert L. Charlebois,
Avisek Deyati,
Paul Duncan,
Fabio La Neve,
Heather D. Malicki,
Sebastien Ribrioux,
Daniel K. Rozelle,
Brandye Michaels,
Wenping Sun,
Zhihui Yang,
Arifa S. Khan

Affiliations

Christophe Lambert: GSK, 1330 Rixensart, Belgium
Cassandra Braxton: Biogen Inc., Research Triangle Park, NC 27709, USA
Robert L. Charlebois: Analytical Research and Development, Sanofi Pasteur, Toronto, ON M2R 3T4, Canada
Avisek Deyati: GSK, 1330 Rixensart, Belgium
Paul Duncan: Merck & Co. Inc., West Point, PA 19486, USA
Fabio La Neve: Merck KGaA, 10010 Torino, Italy
Heather D. Malicki: WuXi AppTec, Philadelphia, PA 19112, USA
Sebastien Ribrioux: Genedata AG, 4053 Basel, Switzerland
Daniel K. Rozelle: Radiant Systems, Inc., Plainfield, NJ 07080, USA
Brandye Michaels: Analytical Research and Development: Microbiology, Pfizer Inc., Andover, MA 01810, USA
Wenping Sun: WuXi AppTec, Philadelphia, PA 19112, USA
Zhihui Yang: Office of Applied Research and Safety Assessment, Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD 20708, USA
Arifa S. Khan: Office of Vaccines Research and Review, Center for Biologics Evaluation and Research, U.S. Food and Drug Administration, Silver Spring, MD 20993, USA

DOI: https://doi.org/10.3390/v10100528
Journal volume & issue: Vol. 10, no. 10
p. 528

Abstract

Read online

High-throughput sequencing (HTS) has demonstrated capabilities for broad virus detection based upon discovery of known and novel viruses in a variety of samples, including clinical, environmental, and biological. An important goal for HTS applications in biologics is to establish parameter settings that can afford adequate sensitivity at an acceptable computational cost (computation time, computer memory, storage, expense or/and efficiency), at critical steps in the bioinformatics pipeline, including initial data quality assessment, trimming/cleaning, and assembly (to reduce data volume and increase likelihood of appropriate sequence identification). Additionally, the quality and reliability of the results depend on the availability of a complete and curated viral database for obtaining accurate results; selection of sequence alignment programs and their configuration, that retains specificity for broad virus detection with reduced false-positive signals; removal of host sequences without loss of endogenous viral sequences of interest; and use of a meaningful reporting format, which can retain critical information of the analysis for presentation of readily interpretable data and actionable results. Furthermore, after alignment, both automated and manual evaluation may be needed to verify the results and help assign a potential risk level to residual, unmapped reads. We hope that the collective considerations discussed in this paper aid toward optimization of data analysis pipelines for virus detection by HTS.

Published in Viruses

ISSN: 1999-4915 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Microbiology
Website: http://www.mdpi.com/journal/viruses

About the journal

Abstract

Keywords