Scientific Reports (Nov 2024)

Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques

  • Jorge F. Beltrán,
  • Lisandra Herrera-Belén,
  • Alejandro J. Yáñez,
  • Luis Jimenez

DOI
https://doi.org/10.1038/s41598-024-77028-y
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Viral oncoproteins play crucial roles in transforming normal cells into cancer cells, representing a significant factor in the etiology of various cancers. Traditionally, identifying these oncoproteins is both time-consuming and costly. With advancements in computational biology, bioinformatics tools based on machine learning have emerged as effective methods for predicting biological activities. Here, for the first time, we propose an innovative approach that combines Generative Adversarial Networks (GANs) with supervised learning methods to enhance the accuracy and generalizability of viral oncoprotein prediction. Our methodology evaluated multiple machine learning models, including Random Forest, Multilayer Perceptron, Light Gradient Boosting Machine, eXtreme Gradient Boosting, and Support Vector Machine. In ten-fold cross-validation on our training dataset, the GAN-enhanced Random Forest model demonstrated superior performance metrics: 0.976 accuracy, 0.976 F1 score, 0.977 precision, 0.976 sensitivity, and 1.0 AUC. During independent testing, this model achieved 0.982 accuracy, 0.982 F1 score, 0.982 precision, 0.982 sensitivity, and 1.0 AUC. These results establish our new tool, VirOncoTarget, accessible via a web application. We anticipate that VirOncoTarget will be a valuable resource for researchers, enabling rapid and reliable viral oncoprotein prediction and advancing our understanding of their role in cancer biology.

Keywords