Genomics & Informatics (Oct 2024)

A prediction of mutations in infectious viruses using artificial intelligence

  • Won Jong Choi,
  • Jongkeun Park,
  • Do Young Seong,
  • Dae Sun Chung,
  • Dongwan Hong

DOI
https://doi.org/10.1186/s44342-024-00019-y
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Many subtypes of SARS-CoV-2 have emerged since its early stages, with mutations showing regional and racial differences. These mutations significantly affected the infectivity and severity of the virus. This study aimed to predict the mutations that occur during the evolution of SARS-CoV-2 and identify the key characteristics for making these predictions. We collected and organized data on the lineage, date, clade, and mutations of SARS-CoV-2 from publicly available databases and processed them to predict the mutations. In addition, we utilized various artificial intelligence models to predict newly emerging mutations and created various training sets based on clade information. Using only mutation information resulted in low performance of the learning models, whereas incorporating clade differentiation resulted in high performance in machine learning models, including XGBoost (accuracy: 0.999). However, mutations fixed in the receptor-binding motif (RBM) region of Omicron resulted in decreased predictive performance. Using these models, we predicted potential mutation positions for 24C, following the recently emerged 24A and 24B clades. We identified a mutation at position Q493 in the RBM region. Our study developed effective artificial intelligence models and characteristics for predicting new mutations in continuously evolving infectious viruses.

Keywords