Diversity (Mar 2025)
DNA Barcode Contamination Screen (DBCscreen): A Pipeline to Rapidly Detect DNA Barcode Contamination for Biodiversity Research
Abstract
NGS sequencing data are expanding exponentially, accompanied by a concomitant growth in non-target species contamination. Meanwhile, these seemingly undesirable sequences can actually provide valuable insights into the broad-scale diversity and distribution of their parasites or symbionts. In this study, we developed a pipeline called DBCscreen (DNA Barcode Contamination screen) to explore biodiversity and distribution across a broad range of living organisms, based on a DNA barcode contamination survey. We used DBCscreen to screen 39,302 eukaryotic assemblies in the NCBI TSA/WGS database, and after stringent filtering, we ultimately identified 110,880 contaminated contigs related to DNA barcodes in 10,717 assemblies. Subsequently, the taxonomic information of these contaminants was determined, and their heterogeneous distribution patterns revealed complex relationships between the hosts (assembly source) and their associated parasites or symbionts (contaminants). Finally, several application examples demonstrating the use of DBCscreen were described, such as identification of the most easily contaminated organisms associated with a specific host (ex. ticks), as well as the specification of which hosts are particularly prone to certain types of contamination (ex. Wolbachia and nematodes).
Keywords