Frontiers in Microbiology (Mar 2022)
Comparing Long-Read Assemblers to Explore the Potential of a Sustainable Low-Cost, Low-Infrastructure Approach to Sequence Antimicrobial Resistant Bacteria With Oxford Nanopore Sequencing
Abstract
Long-read sequencing (LRS) can resolve repetitive regions, a limitation of short read (SR) data. Reduced cost and instrument size has led to a steady increase in LRS across diagnostics and research. Here, we re-basecalled FAST5 data sequenced between 2018 and 2021 and analyzed the data in relation to gDNA across a large dataset (n = 200) spanning a wide GC content (25–67%). We examined whether re-basecalled data would improve the hybrid assembly, and, for a smaller cohort, compared long read (LR) assemblies in the context of antimicrobial resistance (AMR) genes and mobile genetic elements. We included a cost analysis when comparing SR and LR instruments. We compared the R9 and R10 chemistries and reported not only a larger yield but increased read quality with R9 flow cells. There were often discrepancies with ARG presence/absence and/or variant detection in LR assemblies. Flye-based assemblies were generally efficient at detecting the presence of ARG on both the chromosome and plasmids. Raven performed more quickly but inconsistently recovered small plasmids, notably a ∼15-kb Col-like plasmid harboring blaKPC. Canu assemblies were the most fragmented, with genome sizes larger than expected. LR assemblies failed to consistently determine multiple copies of the same ARG as identified by the Unicycler reference. Even with improvements to ONT chemistry and basecalling, long-read assemblies can lead to misinterpretation of data. If LR data are currently being relied upon, it is necessary to perform multiple assemblies, although this is resource (computing) intensive and not yet readily available/useable.
Keywords