Scientific Data (May 2025)

Two haplotype-resolved telomere-to-telomere genome assemblies of Xanthoceras sorbifolium

  • Yu Liu,
  • Yijun Chen,
  • Zizheng Ren,
  • Kui Li,
  • Xu Wang,
  • Kai Wu,
  • Jinfeng Liu,
  • Nir Sade,
  • Hang He,
  • Shouke Li,
  • Haiyang Jiang,
  • Xue Han

DOI
https://doi.org/10.1038/s41597-025-05057-x
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Yellowhorn (Xanthoceras sorbifolium) is widely used in northern China for landscaping, desertification control, and oil production. However, the lack of high-quality genomes has hindered breeding and evolutionary studies. Here, we present the first haplotype-resolved, telomere-to-telomere (T2T) yellowhorn genomes of PBN-43 (white single-flowered) and PBN-126 (white double-flowered) using PacBio HiFi and Hi-C data. These assemblies range from 464.34 Mb to 468.97 Mb and include all centromeres and telomeres. Genome annotation revealed that an average of 67.99% (317.09 Mb) of yellowhorn genomic regions consist of repetitive elements across all haplotypes. The number of protein-coding genes ranges from 35,039 to 35,174 among assemblies, representing an average 50.16% increase over the first published yellowhorn genome. Additionally, 93.90% of the annotated genes have functional annotations. We found yellowhorn experienced an LTR-RT burst during the last 0.45–0.48 Mya. These data provide a resource for investigating genomic variations, phylogenetic relationships, duplication modes, and the distribution of nucleotide-binding leucine-rich repeat (NLR) genes, and support further research into yellowhorn breeding.