Synthetic and Systems Biotechnology (Jun 2023)
Cost-effective hybrid long-short read assembly delineates alternative GC-rich Streptomyces hosts for natural product discovery
Abstract
With the advent of rapid automated in silico identification of biosynthetic gene clusters (BGCs), genomics presents vast opportunities to accelerate natural product (NP) discovery. However, prolific NP producers, Streptomyces, are exceptionally GC-rich (>80%) and highly repetitive within BGCs. These pose challenges in sequencing and high-quality genome assembly which are currently circumvented via intensive sequencing. Here, we outline a more cost-effective workflow using multiplex Illumina and Oxford Nanopore sequencing with hybrid long-short read assembly algorithms to generate high quality genomes. Our protocol involves subjecting long read-derived assemblies to up to 4 rounds of polishing with short reads to yield accurate BGC predictions. We successfully sequenced and assembled 8 GC-rich Streptomyces genomes whose lengths range from 7.1 to 12.1 Mb with a median N50 of 8.2 Mb. Taxonomic analysis revealed previous misrepresentation among these strains and allowed us to propose a potentially new species, Streptomyces sydneybrenneri. Further comprehensive characterization of their biosynthetic, pan-genomic and antibiotic resistance features especially for molecules derived from type I polyketide synthase (PKS) BGCs reflected their potential as alternative NP hosts. Thus, the genome assemblies and insights presented here are envisioned to serve as gateway for the scientific community to expand their avenues in NP discovery.