Frontiers in Genetics (Mar 2024)

Compositional features analysis by machine learning in genome represents linear adaptation of monkeypox virus

  • Sen Zhang,
  • Ya-Dan Li,
  • Yu-Rong Cai,
  • Yu-Rong Cai,
  • Xiao-Ping Kang,
  • Ye Feng,
  • Yu-Chang Li,
  • Yue-Hong Chen,
  • Jing Li,
  • Jing Li,
  • Li-Li Bao,
  • Tao Jiang

DOI
https://doi.org/10.3389/fgene.2024.1361952
Journal volume & issue
Vol. 15

Abstract

Read online

Introduction: The global headlines have been dominated by the sudden and widespread outbreak of monkeypox, a rare and endemic zoonotic disease caused by the monkeypox virus (MPXV). Genomic composition based machine learning (ML) methods have recently shown promise in identifying host adaptability and evolutionary patterns of virus. Our study aimed to analyze the genomic characteristics and evolutionary patterns of MPXV using ML methods.Methods: The open reading frame (ORF) regions of full-length MPXV genomes were filtered and 165 ORFs were selected as clusters with the highest homology. Unsupervised machine learning methods of t-distributed stochastic neighbor embedding (t-SNE), Principal Component Analysis (PCA), and hierarchical clustering were performed to observe the DCR characteristics of the selected ORF clusters.Results: The results showed that MPXV sequences post-2022 showed an obvious linear adaptive evolution, indicating that it has become more adapted to the human host after accumulating mutations. For further accurate analysis, the ORF regions with larger variations were filtered out based on the ranking of homology difference to narrow down the key ORF clusters, which drew the same conclusion of linear adaptability. Then key differential protein structures were predicted by AlphaFold 2, which meant that difference in main domains might be one of the internal reasons for linear adaptive evolution.Discussion: Understanding the process of linear adaptation is critical in the constant evolutionary struggle between viruses and their hosts, playing a significant role in crafting effective measures to tackle viral diseases. Therefore, the present study provides valuable insights into the evolutionary patterns of the MPXV in 2022 from the perspective of genomic composition characteristics analysis through ML methods.

Keywords