Scientific Data (Jan 2023)
InvitroSPI and a large database of proteasome-generated spliced and non-spliced peptides
Abstract
Abstract Noncanonical epitopes presented by Human Leucocyte Antigen class I (HLA-I) complexes to CD8+ T cells attracted the spotlight in the research of novel immunotherapies against cancer, infection and autoimmunity. Proteasomes, which are the main producers of HLA-I-bound antigenic peptides, can catalyze both peptide hydrolysis and peptide splicing. The prediction of proteasome-generated spliced peptides is an objective that still requires a reliable (and large) database of non-spliced and spliced peptides produced by these proteases. Here, we present an extended database of proteasome-generated spliced and non-spliced peptides, which was obtained by analyzing in vitro digestions of 80 unique synthetic polypeptide substrates, measured by different mass spectrometers. Peptides were identified through invitroSPI method, which was validated through in silico and in vitro strategies. The peptide product database contains 16,631 unique peptide products (5,493 non-spliced, 6,453 cis-spliced and 4,685 trans-spliced peptide products), and a substrate sequence variety that is a valuable source for predictors of proteasome-catalyzed peptide hydrolysis and splicing. Potential artefacts and skewed results due to different identification and analysis strategies are discussed.