Classification of bioactive peptides: A systematic benchmark of models and encodings

Edoardo Bizzotto; Guido Zampieri; Laura Treu; Pasquale Filannino; Raffaella Di Cagno; Stefano Campanaro

Computational and Structural Biotechnology Journal (Dec 2024)

Classification of bioactive peptides: A systematic benchmark of models and encodings

Edoardo Bizzotto,
Guido Zampieri,
Laura Treu,
Pasquale Filannino,
Raffaella Di Cagno,
Stefano Campanaro

Affiliations

Edoardo Bizzotto: Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy
Guido Zampieri: Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy; Corresponding author.
Laura Treu: Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy
Pasquale Filannino: Department of Soil, Plant and Food Science, University of Bari Aldo Moro, Via G. Amendola 165/a, Bari 70126, Italy
Raffaella Di Cagno: Faculty of Agricultural, Environmental and Food Sciences, Free University of Bolzano, Piazza Universita, 5, Bolzano 39100, Italy
Stefano Campanaro: Department of Biology, University of Padua, Via U. Bassi 58/b, Padova 35131, Italy

Journal volume & issue: Vol. 23
pp. 2442 – 2452

Abstract

Read online

Bioactive peptides are short amino acid chains possessing biological activity and exerting physiological effects relevant to human health. Despite their therapeutic value, their identification remains a major problem, as it mainly relies on time-consuming in vitro tests. While bioinformatic tools for the identification of bioactive peptides are available, they are focused on specific functional classes and have not been systematically tested on realistic settings. To tackle this problem, bioactive peptide sequences and functions were here gathered from a variety of databases to generate a unified collection of bioactive peptides from microbial fermentation. This collection was organized into nine functional classes including some previously studied and some unexplored such as immunomodulatory, opioid and cardiovascular peptides. Upon assessing their sequence properties, four alternative encoding methods were tested in combination with a multitude of machine learning algorithms, from basic classifiers like logistic regression to advanced algorithms like BERT. Tests on a total of 171 models showed that, while some functions are intrinsically easier to detect, no single combination of classifiers and encoders worked universally well for all classes. For this reason, we unified all the best individual models for each class and generated CICERON (Classification of bIoaCtive pEptides fRom micrObial fermeNtation), a classification tool for the functional classification of peptides. State-of-the-art classifiers were found to underperform on our realistic benchmark dataset compared to the models included in CICERON. Altogether, our work provides a tool for real-world peptide classification and can serve as a benchmark for future model development.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords