BMC Bioinformatics (Jun 2008)

Identifying a few foot-and-mouth disease virus signature nucleotide strings for computational genotyping

  • Xu Lizhe,
  • Wan Xiu-Feng,
  • Wu Junfeng,
  • Cai Zhipeng,
  • Lin Guohui,
  • Goebel Randy

DOI
https://doi.org/10.1186/1471-2105-9-279
Journal volume & issue
Vol. 9, no. 1
p. 279

Abstract

Read online

Abstract Background Serotypes of the Foot-and-Mouth disease viruses (FMDVs) were generally determined by biological experiments. The computational genotyping is not well studied even with the availability of whole viral genomes, due to uneven evolution among genes as well as frequent genetic recombination. Naively using sequence comparison for genotyping is only able to achieve a limited extent of success. Results We used 129 FMDV strains with known serotype as training strains to select as many as 140 most serotype-specific nucleotide strings. We then constructed a linear-kernel Support Vector Machine classifier using these 140 strings. Under the leave-one-out cross validation scheme, this classifier was able to assign correct serotype to 127 of these 129 strains, achieving 98.45% accuracy. It also assigned serotype correctly to an independent test set of 83 other FMDV strains downloaded separately from NCBI GenBank. Conclusion Computational genotyping is much faster and much cheaper than the wet-lab based biological experiments, upon the availability of the detailed molecular sequences. The high accuracy of our proposed method suggests the potential of utilizing a few signature nucleotide strings instead of whole genomes to determine the serotypes of novel FMDV strains.