Data in Brief (Jun 2023)
Manually curated dataset of catalytic peptides for ester hydrolysis
Abstract
Catalytic peptides are low cost biomolecules able to catalyse chemical reactions such as ester hydrolysis. This dataset provides a list of catalytic peptides currently reported in literature. Several parameters were evaluated, including sequence length, composition, net charge, isoelectric point, hydrophobicity, self-assembly propensity and mechanism of catalysis. Along with the analysis of physico-chemical properties, the SMILES representation for each sequence was generated to provide an easy-to-use means of training machine learning models. This offers a unique opportunity for the development and validation of proof-of-concept predictive models. Being a reliable manually curated dataset, it also enables the benchmark for comparison of new models or models trained on automatically gathered peptide-oriented datasets. Moreover, the dataset provides an insight in the currently developed catalytic mechanisms and can be used as the foundation for the development of next-generation peptide-based catalysts.