Improved prediction of MHC-peptide binding using protein language models

Nasser Hashemi; Boran Hao; Mikhail Ignatov; Mikhail Ignatov; Ioannis Ch. Paschalidis; Ioannis Ch. Paschalidis; Ioannis Ch. Paschalidis; Pirooz Vakili; Sandor Vajda; Sandor Vajda; Sandor Vajda; Dima Kozakov; Dima Kozakov; Dima Kozakov

doi:10.3389/fbinf.2023.1207380

Frontiers in Bioinformatics (Aug 2023)

Improved prediction of MHC-peptide binding using protein language models

Nasser Hashemi,
Boran Hao,
Mikhail Ignatov,
Mikhail Ignatov,
Ioannis Ch. Paschalidis,
Ioannis Ch. Paschalidis,
Ioannis Ch. Paschalidis,
Pirooz Vakili,
Sandor Vajda,
Sandor Vajda,
Sandor Vajda,
Dima Kozakov,
Dima Kozakov,
Dima Kozakov

Affiliations

Nasser Hashemi: Division of Systems Engineering, Boston University, Boston, MA, United States
Boran Hao: Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
Mikhail Ignatov: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
Mikhail Ignatov: Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
Ioannis Ch. Paschalidis: Division of Systems Engineering, Boston University, Boston, MA, United States
Ioannis Ch. Paschalidis: Department of Electrical and Computer Engineering, Boston University, Boston, MA, United States
Ioannis Ch. Paschalidis: Department of Biomedical Engineering, Boston University, Boston, MA, United States
Pirooz Vakili: Division of Systems Engineering, Boston University, Boston, MA, United States
Sandor Vajda: Division of Systems Engineering, Boston University, Boston, MA, United States
Sandor Vajda: Department of Biomedical Engineering, Boston University, Boston, MA, United States
Sandor Vajda: Department of Chemistry, Boston University, Boston, MA, United States
Dima Kozakov: Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, United States
Dima Kozakov: Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY, United States
Dima Kozakov: Department of Biomedical Engineering, Boston University, Boston, MA, United States

DOI: https://doi.org/10.3389/fbinf.2023.1207380
Journal volume & issue: Vol. 3

Abstract

Read online

Major histocompatibility complex Class I (MHC-I) molecules bind to peptides derived from intracellular antigens and present them on the surface of cells, allowing the immune system (T cells) to detect them. Elucidating the process of this presentation is essential for regulation and potential manipulation of the cellular immune system. Predicting whether a given peptide binds to an MHC molecule is an important step in the above process and has motivated the introduction of many computational approaches to address this problem. NetMHCPan, a pan-specific model for predicting binding of peptides to any MHC molecule, is one of the most widely used methods which focuses on solving this binary classification problem using shallow neural networks. The recent successful results of Deep Learning (DL) methods, especially Natural Language Processing (NLP-based) pretrained models in various applications, including protein structure determination, motivated us to explore their use in this problem. Specifically, we consider the application of deep learning models pretrained on large datasets of protein sequences to predict MHC Class I-peptide binding. Using the standard performance metrics in this area, and the same training and test sets, we show that our models outperform NetMHCpan4.1, currently considered as the-state-of-the-art.

Published in Frontiers in Bioinformatics

ISSN: 2673-7647 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.frontiersin.org/journals/bioinformatics

About the journal

Abstract

Keywords