BMC Microbiology (Apr 2009)

Virus variation resources at the National Center for Biotechnology Information: dengue virus

  • Rozanov Michael,
  • Kiryutin Boris,
  • Zaslavsky Leonid,
  • Resch Wolfgang,
  • Bao Yiming,
  • Tatusova Tatiana A

DOI
https://doi.org/10.1186/1471-2180-9-65
Journal volume & issue
Vol. 9, no. 1
p. 65

Abstract

Read online

Abstract Background There is an increasing number of complete and incomplete virus genome sequences available in public databases. This large body of sequence data harbors information about epidemiology, phylogeny, and virulence. Several specialized databases, such as the NCBI Influenza Virus Resource or the Los Alamos HIV database, offer sophisticated query interfaces along with integrated exploratory data analysis tools for individual virus species to facilitate extracting this information. Thus far, there has not been a comprehensive database for dengue virus, a significant public health threat. Results We have created an integrated web resource for dengue virus. The technology developed for the NCBI Influenza Virus Resource has been extended to process non-segmented dengue virus genomes. In order to allow efficient processing of the dengue genome, which is large in comparison with individual influenza segments, we developed an offline pre-alignment procedure which generates a multiple sequence alignment of all dengue sequences. The pre-calculated alignment is then used to rapidly create alignments of sequence subsets in response to user queries. This improvement in technology will also facilitate the incorporation of additional virus species in the future. The set of virus-specific databases at NCBI, which will be referred to as Virus Variation Resources (VVR), allow users to build complex queries against virus-specific databases and then apply exploratory data analysis tools to the results. The metadata is automatically collected where possible, and extended with data extracted from the literature. Conclusion The NCBI Dengue Virus Resource integrates dengue sequence information with relevant metadata (sample collection time and location, disease severity, serotype, sequenced genome region) and facilitates retrieval and preliminary analysis of dengue sequences using integrated web analysis and visualization tools.