Communications Biology (Mar 2025)
Genome assembly of Stewartia sinensis reveals origin and evolution of orphan genes in Theaceae
Abstract
Abstract Orphan genes play crucial roles in diverse biological processes, but the evolutionary trajectories and functional divergence remain largely unexplored. The Theaceae family, including the economically and culturally important tea plant, offers a distinctive model to examine these aspects. Here, we integrated Nanopore long-read sequencing, Illumina short-read sequencing, and Hi-C methods to decode a pseudo-chromosomal genome assembly of Stewartia sinensis, from the earliest-diverging tribe of Theaceae, spanning 2.95 Gb. Comparative genomic analysis revealed the absence of recent whole-genome duplication events in the Theaceae ancestor, highlighting tandem duplications as the predominant mechanism of gene expansion. We identified 31,331 orphan genes, some of which appear to have ancient origins, suggesting early emergence with frequent gains and losses, while others seem more specific and recent. Notably, orphan genes are distinguished by shorter lengths, fewer exons and functional domains compared to genes that originate much earlier, like transcription factors. Moreover, tandem duplication contributes significantly to the adaptive evolution and characteristic diversity of Theaceae, and it is also a major mechanism driving the origination of orphan genes. This study illuminates the evolutionary dynamics of orphan genes, providing a valuable resource for understanding the origin and evolution of tea plant flavor and enhancing genetic breeding efforts.