IEEE Access (Jan 2024)
TlcMHCpan: A Novel Deep Learning Model for Enhanced Pan-Specific Prediction of Peptide-HLA Binding
Abstract
The interaction between Human Leukocyte Antigens (HLA) and peptides is key in cellular immunology and crucial for the development of the immune system and peptide-based drug design. Currently, in the field of machine learning for predicting peptide-HLA (pHLA) binding, the mainstream methods involve neural network-based models that enhance prediction accuracy and efficiency by simulating the interactions between HLA and peptides. Among the peptides binding to class I HLA, most sequences are 9 amino acids in length, therefore, these models mainly consider the binding prediction of peptides with a fixed length of 9. Additionally, most neural network models rely on pseudo-sequence encoding techniques, which are designed based on 34 key positions in the peptide-HLA binding structure. Although this method provides important contextual clues for the model, it may not fully capture the complex interactions between HLA and peptides, thereby affecting prediction accuracy. To address this issue, we introduce a novel pan-specific prediction model, TlcMHCpan, which is capable of handling peptide sequences of varying lengths. It leverages deep learning techniques including Transformer, LSTM, and CNN, and incorporates a self-attention mechanism to enhance feature extraction capabilities. We have conducted a comprehensive evaluation of TlcMHCpan on the latest benchmark dataset provided by the Immune Epitope Database (IEDB). The experimental results show that, out of 38 benchmark datasets, TlcMHCpan achieved the highest AUC score in 11 datasets, with 6 of them being the exclusive top performer.
Keywords