Plant Direct (Dec 2023)

Identification and preliminary characterization of conserved uncharacterized proteins from Chlamydomonas reinhardtii, Arabidopsis thaliana, and Setaria viridis

  • Eric P. Knoshaug,
  • Peipei Sun,
  • Ambarish Nag,
  • Huong Nguyen,
  • Erin M. Mattoon,
  • Ningning Zhang,
  • Jian Liu,
  • Chen Chen,
  • Jianlin Cheng,
  • Ru Zhang,
  • Peter St. John,
  • James Umen

DOI
https://doi.org/10.1002/pld3.527
Journal volume & issue
Vol. 7, no. 12
pp. n/a – n/a

Abstract

Read online

Abstract The rapid accumulation of sequenced plant genomes in the past decade has outpaced the still difficult problem of genome‐wide protein‐coding gene annotation. A substantial fraction of protein‐coding genes in all plant genomes are poorly annotated or unannotated and remain functionally uncharacterized. We identified unannotated proteins in three model organisms representing distinct branches of the green lineage (Viridiplantae): Arabidopsis thaliana (eudicot), Setaria viridis (monocot), and Chlamydomonas reinhardtii (Chlorophyte alga). Using similarity searching, we identified a subset of unannotated proteins that were conserved between these species and defined them as Deep Green proteins. Bioinformatic, genomic, and structural predictions were performed to begin classifying Deep Green genes and proteins. Compared to whole proteomes for each species, the Deep Green set was enriched for proteins with predicted chloroplast targeting signals predictive of photosynthetic or plastid functions, a result that was consistent with enrichment for daylight phase diurnal expression patterning. Structural predictions using AlphaFold and comparisons to known structures showed that a significant proportion of Deep Green proteins may possess novel folds. Though only available for three organisms, the Deep Green genes and proteins provide a starting resource of high‐value targets for further investigation of potentially new protein structures and functions conserved across the green lineage.

Keywords