Frontiers in Genetics (Feb 2023)

RF_phage virion: Classification of phage virion proteins with a random forest model

  • Yanqin Zhang,
  • Zhiyuan Li

DOI
https://doi.org/10.3389/fgene.2022.1103783
Journal volume & issue
Vol. 13

Abstract

Read online

Introduction: Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle.Methods: This study uses machine learning methods to classify phage virion proteins. We proposed a novel approach, RF_phage virion, for the effective classification of the virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed to solve the classification problem.Results: The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of .8371, and an F1 score of .9196.

Keywords