Frontiers in Microbiology (Nov 2023)
Impact of wet-lab protocols on quality of whole-genome short-read sequences from foodborne microbial pathogens
Abstract
For successful elucidation of a food-borne infection chain, the availability of high-quality sequencing data from suspected microbial contaminants is a prerequisite. Commonly, those investigations are a joint effort undertaken by different laboratories and institutes. To analyze the extent of variability introduced by differing wet-lab procedures on the quality of the sequence data we conducted an interlaboratory study, involving four bacterial pathogens, which account for the majority of food-related bacterial infections: Campylobacter spp., Shiga toxin-producing Escherichia coli, Listeria monocytogenes, and Salmonella enterica. The participants, ranging from German federal research institutes, federal state laboratories to universities and companies, were asked to follow their routine in-house protocols for short-read sequencing of 10 cultures and one isolated bacterial DNA per species. Sequence and assembly quality were then analyzed centrally. Variations within isolate samples were detected with SNP and cgMLST calling. Overall, we found that the quality of Illumina raw sequence data was high with little overall variability, with one exception, attributed to a specific library preparation kit. The variability of Ion Torrent data was higher, independent of the investigated species. For cgMLST and SNP analysis results, we found that technological sequencing artefacts could be reduced by the use of filters, and that SNP analysis was more suited than cgMLST to compare data of different contributors. Regarding the four species, a minority of Campylobacter isolate data showed the in comparison highest divergence with regard to sequence type and cgMLST analysis. We additionally compared the assembler SPAdes and SKESA for their performance on the Illumina data sets of the different species and library preparation methods and found overall similar assembly quality metrics and cgMLST statistics.
Keywords