Archives of Biological Sciences (Jan 2016)
DUF1070 as a signature domain of a subclass of arabinogalactan peptides
Abstract
Over 20% of all protein domains are currently annotated as “domains of unknown function” or DUFs. In a recently identified Centaurium erythraea arabinogalactan peptide, CeAGP3 (AGN92423), a conserved DUF1070 domain was found. Since identifying functions for DUFs is important in systems biology, we have analyzed the distribution and structure of DUF1070 domain (pfam06376) using a set of bioinformatics tools. There are 271 publically available DUF1070 members from 25 diverse families of vascular plants, and most are short sequences (50-100 aa). The N-terminal signal peptide (Nsp) was found in almost all complete sequences. In 233 sequences, at least two noncontiguous prolines were found as clustered dipeptides predicted to be hydroxylated and glycosylated with type II arabino-3,6-galactans, thus representing AG-II glycomodules. In addition, 35 sequences contained a region rich in basic residues (basic linker, BL). The N-terminal part of the DUF1070 domain is comprised of (part of) AG-II and/or BL, while the highly conserved C-terminus is a region of 26 aa, termed SH26. In 212 sequences, SH26 was a typical glycosylphosphatidylinositol lipid anchor signal peptide (GPIsp), but in 83 cases GPIsp was not predicted due to software constraints. In sequences where both Nsp and GPIsp were predicted, the length of mature peptides could be calculated, and it was 10-16 aa. Our analysis suggests that DUF1070 members are arabinogalactan (AG) peptides, of which the majority are GPI-anchored. DUF1070 is the only conserved domain found in classical arabinogalactan proteins and AG peptides. The SH26 region can be used for mining and annotation of AG peptides. [Projekat Ministarstva nauke Republike Srbije, br. ТR31019]
Keywords