Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data

Natalia Szulc; Michał Burdukiewicz; Marlena Gąsior-Głogowska; Jakub W. Wojciechowski; Jarosław Chilimoniuk; Paweł Mackiewicz; Tomas Šneideris; Vytautas Smirnovas; Malgorzata Kotulska

doi:10.1038/s41598-021-86530-6

Scientific Reports (Apr 2021)

Bioinformatics methods for identification of amyloidogenic peptides show robustness to misannotated training data

Natalia Szulc,
Michał Burdukiewicz,
Marlena Gąsior-Głogowska,
Jakub W. Wojciechowski,
Jarosław Chilimoniuk,
Paweł Mackiewicz,
Tomas Šneideris,
Vytautas Smirnovas,
Malgorzata Kotulska

Affiliations

Natalia Szulc: Department of Biomedical Engineering, Wroclaw University of Science and Technology
Michał Burdukiewicz: Medical University of Bialystok
Marlena Gąsior-Głogowska: Department of Biomedical Engineering, Wroclaw University of Science and Technology
Jakub W. Wojciechowski: Department of Biomedical Engineering, Wroclaw University of Science and Technology
Jarosław Chilimoniuk: Faculty of Biotechnology, University of Wroclaw
Paweł Mackiewicz: Faculty of Biotechnology, University of Wroclaw
Tomas Šneideris: Life Sciences Center, Institute of Biotechnology, Vilnius University
Vytautas Smirnovas: Life Sciences Center, Institute of Biotechnology, Vilnius University
Malgorzata Kotulska: Department of Biomedical Engineering, Wroclaw University of Science and Technology

DOI: https://doi.org/10.1038/s41598-021-86530-6
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Several disorders are related to amyloid aggregation of proteins, for example Alzheimer’s or Parkinson’s diseases. Amyloid proteins form fibrils of aggregated beta structures. This is preceded by formation of oligomers—the most cytotoxic species. Determining amyloidogenicity is tedious and costly. The most reliable identification of amyloids is obtained with high resolution microscopies, such as electron microscopy or atomic force microscopy (AFM). More frequently, less expensive and faster methods are used, especially infrared (IR) spectroscopy or Thioflavin T staining. Different experimental methods are not always concurrent, especially when amyloid peptides do not readily form fibrils but oligomers. This may lead to peptide misclassification and mislabeling. Several bioinformatics methods have been proposed for in-silico identification of amyloids, many of them based on machine learning. The effectiveness of these methods heavily depends on accurate annotation of the reference training data obtained from in-vitro experiments. We study how robust are bioinformatics methods to weak supervision, encountering imperfect training data. AmyloGram and three other amyloid predictors were applied. The results proved that a certain degree of misannotation in the reference data can be eliminated by the bioinformatics tools, even if they belonged to their training set. The computational results are supported by new experiments with IR and AFM methods.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal