Scientific Reports (Jul 2017)

Comprehensive analysis of human protein N-termini enables assessment of various protein forms

  • Jeonghun Yeom,
  • Shinyeong Ju,
  • YunJin Choi,
  • Eunok Paek,
  • Cheolju Lee

DOI
https://doi.org/10.1038/s41598-017-06314-9
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Various forms of protein (proteoforms) are generated by genetic variations, alternative splicing, alternative translation initiation, co- or post-translational modification and proteolysis. Different proteoforms are in part discovered by characterizing their N-terminal sequences. Here, we introduce an N-terminal-peptide-enrichment method, Nrich. Filter-aided negative selection formed the basis for the use of two N-blocking reagents and two endoproteases in this method. We identified 6,525 acetylated (or partially acetylated) and 6,570 free protein N-termini arising from 5,727 proteins in HEK293T human cells. The protein N-termini included translation initiation sites annotated in the UniProtKB database, putative alternative translational initiation sites, and N-terminal sites exposed after signal/transit/pro-peptide removal or unknown processing, revealing various proteoforms in cells. In addition, 46 novel protein N-termini were identified in 5′ untranslated region (UTR) sequence with pseudo start codons. Our data showing the observation of N-terminal sequences of mature proteins constitutes a useful resource that may provide information for a better understanding of various proteoforms in cells.