Frontiers in Genetics (Sep 2022)
Development of de-novo transcriptome assembly and SSRs in allohexaploid Brassica with functional annotations and identification of heat-shock proteins for thermotolerance
Abstract
Crop Brassicas contain monogenomic and digenomic species, with no evidence of a trigenomic Brassica in nature. Through somatic fusion (Sinapis alba + B. juncea), a novel allohexaploid trigenomic Brassica (H1 = AABBSS; 2n = 60) was produced and used for transcriptome analysis to uncover genes for thermotolerance, annotations, and microsatellite markers for future molecular breeding. Illumina Novaseq 6000 generated a total of 76,055,546 paired-end raw reads, which were used for de-novo assembly, resulting in the development of 486,066 transcripts. A total of 133,167 coding sequences (CDSs) were predicted from transcripts with a mean length of 507.12 bp and 46.15% GC content. The BLASTX search of CDSs against public protein databases showed a maximum of 126,131 (94.72%) and a minimum of 29,810 (22.39%) positive hits. Furthermore, 953,773 gene ontology (GO) terms were found in 77,613 (58.28%) CDSs, which were divided into biological processes (49.06%), cellular components (31.67%), and molecular functions (19.27%). CDSs were assigned to 144 pathways by a pathway study using the KEGG database and 1,551 pathways by a similar analysis using the Reactome database. Further investigation led to the discovery of genes encoding over 2,000 heat shock proteins (HSPs). The discovery of a large number of HSPs in allohexaploid Brassica validated our earlier findings for heat tolerance at seed maturity. A total of 15,736 SSRs have been found in 13,595 CDSs, with an average of one SSR per 4.29 kb length and an SSR frequency of 11.82%. The first transcriptome assembly of a meiotically stable allohexaploid Brassica has been given in this article, along with functional annotations and the presence of SSRs, which could aid future genetic and genomic studies.
Keywords