IEEE Access (Jan 2019)

VFM: Identification of Bacteriophages From Metagenomic Bins and Contigs Based on Features Related to Gene and Genome Composition

  • Qiaoliang Liu,
  • Fu Liu,
  • Jiaxue He,
  • Miaolei Zhou,
  • Tao Hou,
  • Yun Liu

DOI
https://doi.org/10.1109/ACCESS.2019.2957833
Journal volume & issue
Vol. 7
pp. 177529 – 177538

Abstract

Read online

As the main regulator of microbial community composition, bacteriophages exist widely on Earth. However, since they are hidden in metagenomes, most of them are unknown. To identify phages from metagenomes more effectively, a new tool named VFM (Virus Finding & Mining) is presented in this paper. VFM has two versions, i.e., bin-VFM and unbin-VFM. Eighteen new features describing the codon usage bias, the proportion of hits of clusters of orthologous groups of proteins (COG), and 1-mer and 2-mer frequency are introduced to improve the performance of the classifiers. By using missing value interpolation, bin-VFM improves the classification performance for short sequence bins significantly. Compared with previous tools for virus mining, bin-VFM and unbin-VFM perform much better for simulated and real metagenomes with short and long sequences respectively. Thus, VFM may play a helpful role in studies of metagenome-related problems, such as horizontal gene transfer and antibiotic resistance. VFM is freely available at https://github.com/liuql2019/VFM.

Keywords