Applications in Plant Sciences (Jul 2021)
New targets acquired: Improving locus recovery from the Angiosperms353 probe set
Abstract
PREMISE Universal target enrichment kits maximize utility across wide evolutionary breadth while minimizing the number of baits required to create a cost‐efficient kit. The Angiosperms353 kit has been successfully used to capture loci throughout the angiosperms, but the default target reference file includes sequence information from only 6–18 taxa per locus. Consequently, reads sequenced from on‐target DNA molecules may fail to map to references, resulting in fewer on‐target reads for assembly, and reducing locus recovery. METHODS We expanded the Angiosperms353 target file, incorporating sequences from 566 transcriptomes to produce a ‘mega353’ target file, with each locus represented by 17–373 taxa. This mega353 file is a drop‐in replacement for the original Angiosperms353 file in HybPiper analyses. We provide tools to subsample the file based on user‐selected taxon groups, and to incorporate other transcriptome or protein‐coding gene data sets. RESULTS Compared to the default Angiosperms353 file, the mega353 file increased the percentage of on‐target reads by an average of 32%, increased locus recovery at 75% length by 49%, and increased the total length of the concatenated loci by 29%. DISCUSSION Increasing the phylogenetic density of the target reference file results in improved recovery of target capture loci. The mega353 file and associated scripts are available at: https://github.com/chrisjackson‐pellicle/NewTargets.
Keywords