BMC Plant Biology (Feb 2022)
Whole-genome resequencing of Coffea arabica L. (Rubiaceae) genotypes identify SNP and unravels distinct groups showing a strong geographical pattern
Abstract
Abstract Background Coffea arabica L. is an economically important agricultural crop and the most popular beverage worldwide. As a perennial crop with recalcitrant seed, conservation of the genetic resources of coffee can be achieved through the complementary approach of in-situ and ex-situ field genebank. In Ethiopia, a large collection of C. arabica L. germplasm is preserved in field gene banks. Here, we report the whole-genome resequencing of 90 accessions from Choche germplasm bank representing garden and forest-based coffee production systems using Illumina sequencing technology. Results The genome sequencing generated 6.41 billion paired-end reads, with a mean of 71.19 million reads per sample. More than 93% of the clean reads were mapped onto the C. arabica L. reference genome. A total of 11.08 million variants were identified, among which 9.74 million (87.9%) were SNPs (Single nucleotide polymorphisms) and 1.34 million (12.1%) were InDels. In all accessions, genomic variants were unevenly distributed across the coffee genome. The phylogenetic analysis using the SNP markers displayed distinct groups. Conclusions Resequencing of the coffee accessions has allowed identification of genetic markers, such as SNPs and InDels. The SNPs discovered in this study might contribute to the variation in important pathways of genes for important agronomic traits such as caffeine content, yield, disease, and pest in coffee. Moreover, the genome resequencing data and the genetic markers identified from 90 accessions provide insight into the genetic variation of the coffee germplasm and facilitate a broad range of genetic studies.
Keywords