Scientific Data (May 2025)
Two haplotype-resolved telomere-to-telomere genome assemblies of Xanthoceras sorbifolium
Abstract
Abstract Yellowhorn (Xanthoceras sorbifolium) is widely used in northern China for landscaping, desertification control, and oil production. However, the lack of high-quality genomes has hindered breeding and evolutionary studies. Here, we present the first haplotype-resolved, telomere-to-telomere (T2T) yellowhorn genomes of PBN-43 (white single-flowered) and PBN-126 (white double-flowered) using PacBio HiFi and Hi-C data. These assemblies range from 464.34 Mb to 468.97 Mb and include all centromeres and telomeres. Genome annotation revealed that an average of 67.99% (317.09 Mb) of yellowhorn genomic regions consist of repetitive elements across all haplotypes. The number of protein-coding genes ranges from 35,039 to 35,174 among assemblies, representing an average 50.16% increase over the first published yellowhorn genome. Additionally, 93.90% of the annotated genes have functional annotations. We found yellowhorn experienced an LTR-RT burst during the last 0.45–0.48 Mya. These data provide a resource for investigating genomic variations, phylogenetic relationships, duplication modes, and the distribution of nucleotide-binding leucine-rich repeat (NLR) genes, and support further research into yellowhorn breeding.