پژوهشهای علوم دامی ایران (Apr 2021)
Identification of total microsatellites in the genome of Iranian Bactrian camels using whole genome sequencing data
Abstract
Introduction Bactrian camels are known as one of the resistant species to harsh environmental conditions. The camel’s body temperature may vary from 34 to 41 °C throughout the day. They can survive if they lose body water greater than 25% of total body weight, while, in non-desert mammals, losses of greater than 15% are deadly. Since, Iran is located in one of the most arid regions of the world and water resources shortage, also special capabilities of camels, this species can be a valuable source of protein in the country. The study of genetic diversity is one of the most widely studies in domestic animals and microsatellites are widely used in this field. Microsatellite sequences contain useful information and are widely used to assess genetic diversity within and between populations, as well as to investigate the evolution process between species. The main aim of the present study was to identify the total microsatellites in the genome of Iranian Bactrian camels using whole genome sequencing data and compare them with other mammalians. Materials and Methods This study was carried out to identify genome wide microsatellites on six Bactrian camels from Ardabil province. Blood samples were collected from the jugular vein using 4 ml vacutainer tubes and stored at -20C˚ until use. Illumina HiSeq 2000 technology (Illumina, USA) was used for whole genome sequencing of samples. Sequencing was performed using the paired-end method with 100 bp at both ends of the reads. The quality control of raw sequence reads was performed using FastQC software. The SLIDINGWINDOW (4:20) algorithm of Trimmomatic v0.36 program was used to quality filter of raw reads. After filtration of reads with low quality, reads shorter than 40 bp were discarded. The de novo assembly of trimmed reads from Bactrian camels was done using CLC Genomics Workbench 11 software (CLC Bio, Aarhus, Denmark). The parameters used in this study for de novo assembly of trimmed reads were: 3 for mismatch cost, 3 for deletion and insertion cost, 0.5 for length fraction, and 0.8 for similarity fraction. Assembled genomes were searched for identifying the microsatellites using MISA with motif size ranging from mono-nucleotide to octo-nucleotide. The minimum repeat numbers were defined as 12 for mono-, 6 for di-, 5 for tri- and tetra, 4 for penta- and hexa-, and 3 for hepta- and octo-nucleotide repeat SSRs. Microsatellite motifs that interrupted by 100 nucleotides were considered as compound microsatellites. Also, several mammalians assembled genomes were downloaded and searched for microsatellite loci, including Arabian dromedary camel, Bactrian camel, alpaca, horse, cattle, sheep, and human. Results and Discussion The assembled genome size for the Bactrian camels were ranged from 1.90 for sample one to 1.97 for sample three. Also, the N50 length for the assembled contigs of Iranian Bactrian camels were ranged from 19.1 kb for sample one to 51.5 kb for sample five. The contig N50 length is one of the qualitative measurement parameters of genome assembly and a larger size means better assembly.The total microsatellites loci identified for Iranian Bactrian camels ranged from 136028 for sample two to 539555 for sample three. The results show that the genome of samples one, two, three, four, five and six contained 3.13 Mb, 2.35 Mb, 9.26 Mb, 7.1 Mb, 8.99 Mb and 8.86 Mb microsatellites, respectively. It should be noted that the difference in the microsatellites of SSRs in the Iranian Bactrian camel genomes is due to their different qualities in assembly. In mammals examined in this study, humans with 25.7 Mb and horses with 7.81 Mb had the highest and lowest total size of microsatellites, respectively. The results revealed that the number of microsatellites decreases with increasing in them, repeats, so that, one and two repeats sequences are the most frequent motifs. More than 74% of the identified microsatellites belong to the ten microsatellites with the highest number in all seven species. The motif T is the most frequent motif in the samples one and six Iranian Bactrian camels, Iranian dromedary camels, Bactrian camel, cattle, sheep, horses and humans. In samples two, three, four, five, the non-Iranian dromedary camel and alpaca motif A is the most abundant motif. The finding of this study will be applied as a valuable resource for further studies on camel breeding, especially on Iranian Bactrian camels. A large number of camel’s SSR markers developed in this study established a valuable resource for the investigation of genetic diversity and may improve the development of breeding programs in Iranian Bactrian camels in the future.
Keywords