Scientific Data (Apr 2024)
High-quality faba bean reference transcripts generated using PacBio and Illumina RNA-seq data
Abstract
Abstract The genome of faba bean was first published in 2023. To promote future molecular breeding studies, we improved the quality of the faba genome based on high-density genetic maps and the Illumina and Pacbio RNA-seq datasets. Two high-density genetic maps were used to conduct the scaffold ordering and orientation of faba bean, culminating in an increased length (i.e., 14.28 Mbp) of chromosomes and a decrease in the number of scaffolds by 45. In gene model mining and optimisation, the PacBio and Illumina RNA-seq datasets from 37 samples allowed for the identification and correction 121,606 transcripts, and the data facilitated a prediction of 15,640 alternative splicing events, 2,148 lncRNAs, and 1,752 fusion transcripts, thus allowing for a clearer understanding of the gene structures underlying the faba genome. Moreover, a total of 38,850 new genes including 56,188 transcripts were identified compared with the reference genome. Finally, the genetic data of the reference genome was integrated and a comprehensive and complete faba bean transcriptome sequence of 103,267 transcripts derived from 54,753 uni-genes was formed.