Qualitative Perturbation Analysis and Machine Learning: Elucidating Bacterial Optimization of Tryptophan Production
Miguel Angel Ramos-Valdovinos,
Prisciluis Caheri Salas-Navarrete,
Gerardo R. Amores,
Ana Lilia Hernández-Orihuela,
Agustino Martínez-Antonio
Affiliations
Miguel Angel Ramos-Valdovinos
Laboratorio de Ingeniería Biológica, Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-Unidad Irapuato), Km 9.6 Carr. Irapuato-León, Irapuato 36824, Guanajuato, Mexico
Prisciluis Caheri Salas-Navarrete
Data Science Manager Analytics and Data Governance, Nacional de Drogas Av. Vasco de Quiroga No. 3100 Col. Centro de Ciudad Santa Fe, Álvaro Obregón, Mexico City 01210, Mexico
Gerardo R. Amores
Laboratorio de Ingeniería Biológica, Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-Unidad Irapuato), Km 9.6 Carr. Irapuato-León, Irapuato 36824, Guanajuato, Mexico
Ana Lilia Hernández-Orihuela
Biofab México, 5 de Mayo No. 517, Irapuato 36500, Guanajuato, Mexico
Agustino Martínez-Antonio
Laboratorio de Ingeniería Biológica, Departamento de Ingeniería Genética, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV-Unidad Irapuato), Km 9.6 Carr. Irapuato-León, Irapuato 36824, Guanajuato, Mexico
L-tryptophan is an essential amino acid widely used in the pharmaceutical and feed industries. Enhancing its production in microorganisms necessitates activating and inactivating specific genes to direct more resources toward its synthesis. In this study, we developed a classification model based on Qualitative Perturbation Analysis and Machine Learning (QPAML). The model uses pFBA to obtain optimal reactions for tryptophan production and FSEOF to introduce perturbations on fluxes of the optima reactions while registering all changes over the iML1515a Genome-Scale Metabolic Network model. The altered reaction fluxes and their relationship with tryptophan and biomass production are translated to qualitative variables classified with GBDT. In the end, groups of enzymatic reactions are predicted to be deleted, overexpressed, or attenuated for tryptophan and 30 other metabolites in E. coli with a 92.34% F1-Score. The QPAML model can integrate diverse data types, promising improved predictions and the discovery of complex patterns in microbial metabolic engineering. It has broad potential applications and offers valuable insights for optimizing microbial production in biotechnology.