Microbiology Spectrum (Apr 2022)

VPipe: an Automated Bioinformatics Platform for Assembly and Management of Viral Next-Generation Sequencing Data

  • Darlene D. Wagner,
  • Rachel L. Marine,
  • Edward Ramos,
  • Terry Fei Fan Ng,
  • Christina J. Castro,
  • Margaret Okomo-Adhiambo,
  • Krysten Harvey,
  • Gregory Doho,
  • Reagan Kelly,
  • Yatish Jain,
  • Roman L. Tatusov,
  • Hideky Silva,
  • Paul A. Rota,
  • Agha N. Khan,
  • M. Steven Oberste

DOI
https://doi.org/10.1128/spectrum.02564-21
Journal volume & issue
Vol. 10, no. 2

Abstract

Read online

ABSTRACT Next-generation sequencing (NGS) is a powerful tool for detecting and investigating viral pathogens; however, analysis and management of the enormous amounts of data generated from these technologies remains a challenge. Here, we present VPipe (the Viral NGS Analysis Pipeline and Data Management System), an automated bioinformatics pipeline optimized for whole-genome assembly of viral sequences and identification of diverse species. VPipe automates the data quality control, assembly, and contig identification steps typically performed when analyzing NGS data. Users access the pipeline through a secure web-based portal, which provides an easy-to-use interface with advanced search capabilities for reviewing results. In addition, VPipe provides a centralized system for storing and analyzing NGS data, eliminating common bottlenecks in bioinformatics analyses for public health laboratories with limited on-site computational infrastructure. The performance of VPipe was validated through the analysis of publicly available NGS data sets for viral pathogens, generating high-quality assemblies for 12 data sets. VPipe also generated assemblies with greater contiguity than similar pipelines for 41 human respiratory syncytial virus isolates and 23 SARS-CoV-2 specimens. IMPORTANCE Computational infrastructure and bioinformatics analysis are bottlenecks in the application of NGS to viral pathogens. As of September 2021, VPipe has been used by the U.S. Centers for Disease Control and Prevention (CDC) and 12 state public health laboratories to characterize >17,500 and 1,500 clinical specimens and isolates, respectively. VPipe automates genome assembly for a wide range of viruses, including high-consequence pathogens such as SARS-CoV-2. Such automated functionality expedites public health responses to viral outbreaks and pathogen surveillance.

Keywords