Environmental DNA (Jul 2024)
Enriching barcoding markers in environmental samples utilizing a phylogenetic probe design: Insights from mock communities
Abstract
Abstract Hybridization capture is an emerging method making use of short oligonucleotide baits to enrich DNA libraries for genomic fragments of specific organisms thus enabling detection of their presence in environmental samples. Although it offers a primer‐independent alternative to metabarcoding, little empirical work has been dedicated to characterizing the underlying biases and coupled implications for biological interpretation. Moreover, few published bioinformatic pipelines are available for designing polynucleotide capture baits from a reference sequence collection. We designed RNA‐baits specifically targeting two chloroplast barcoding genes matK and rbcL to reveal the plant taxonomic diversity present in a given environmental sample. Our approach leverages the sensitivity of hybridization capture and the capacity of high‐throughput DNA sequencing instruments. It builds on a new and universal method based on ancestral sequence reconstruction, ultimately limiting the number of bait‐probes required and reducing experimental costs, while accessing high levels of taxonomic diversity. Our bait‐set selectively targets four main plant orders (Fagales, Pinales, Asterales, and Poales), representing ~18% of all described vascular plants. This is achieved through the use of only 4084 baits, each 80 nucleotides in length (80‐mer), capturing ~1.0–1.6 k nucleotide sequences from each taxon. Tests on mock communities revealed important factors influencing capture efficiency and relative abundance estimates, including GC‐content, the overall target length per taxa, and the bait density and mean number of mismatches to the bait sequence. Our results show that hybridization capture, like metabarcoding, requires caution when interpreting results quantitatively within (paleo)‐ecological studies. Biases detected in this work have the potential to be mitigated with bait designs that avoid extreme base compositional biases and balancing bait targets across taxa. However, we strongly recommend the use of mock communities and read simulations to quantify the accuracy of taxonomic representation when using new bait designs.
Keywords