Machine Learning Algorithms Highlight tRNA Information Content and Chargaff’s Second Parity Rule Score as Important Features in Discriminating Probiotics from Non-Probiotics
Carlo M. Bergamini,
Nicoletta Bianchi,
Valerio Giaccone,
Paolo Catellani,
Leonardo Alberghini,
Alessandra Stella,
Stefano Biffani,
Sachithra Kalhari Yaddehige,
Tania Bobbo,
Cristian Taccioli
Affiliations
Carlo M. Bergamini
Department of Neuroscience and Rehabilitation, University of Ferrara, Via L. Borsari 46, 44121 Ferrara, Italy
Nicoletta Bianchi
Department of Translational Medicine, University of Ferrara, Via L. Borsari 46, 44121 Ferrara, Italy
Valerio Giaccone
Department of Animal Medicine, Production and Health (MAPS), University of Padua, Via F. Marzolo 5, 35131 Padua, Italy
Paolo Catellani
Department of Animal Medicine, Production and Health (MAPS), University of Padua, Via F. Marzolo 5, 35131 Padua, Italy
Leonardo Alberghini
Department of Animal Medicine, Production and Health (MAPS), University of Padua, Via F. Marzolo 5, 35131 Padua, Italy
Alessandra Stella
Consiglio Nazionale delle Ricerche (CNR), Istituto di Biologia e Biotecnologia Agraria (IBBA), Via Edoardo Bassini 15, 20133 Milano, Italy
Stefano Biffani
Consiglio Nazionale delle Ricerche (CNR), Istituto di Biologia e Biotecnologia Agraria (IBBA), Via Edoardo Bassini 15, 20133 Milano, Italy
Sachithra Kalhari Yaddehige
Department of Animal Medicine, Production and Health (MAPS), University of Padua, Via F. Marzolo 5, 35131 Padua, Italy
Tania Bobbo
Consiglio Nazionale delle Ricerche (CNR), Istituto di Biologia e Biotecnologia Agraria (IBBA), Via Edoardo Bassini 15, 20133 Milano, Italy
Cristian Taccioli
Department of Animal Medicine, Production and Health (MAPS), University of Padua, Via F. Marzolo 5, 35131 Padua, Italy
Probiotic bacteria are microorganisms with beneficial effects on human health and are currently used in numerous food supplements. However, no selection process is able to effectively distinguish probiotics from non-probiotic organisms on the basis of their genomic characteristics. In the current study, four Machine Learning algorithms were employed to accurately identify probiotic bacteria based on their DNA characteristics. Although the prediction accuracies of all algorithms were excellent, the Neural Network returned the highest scores in all the evaluation metrics, managing to discriminate probiotics from non-probiotics with an accuracy greater than 90%. Interestingly, our analysis also highlighted the information content of the tRNA sequences as the most important feature in distinguishing the two groups of organisms probably because tRNAs have regulatory functions and might have allowed probiotics to evolve faster in the human gut environment. Through the methodology presented here, it was also possible to identify seven promising new probiotics that have a higher information content in their tRNA sequences compared to non-probiotics. In conclusion, we prove for the first time that Machine Learning methods can discriminate human probiotic from non-probiotic organisms underlining information within tRNA sequences as the most important genomic feature in distinguishing them.