BMC Genomics (Jan 2024)
MAGICIAN: MAG simulation for investigating criteria for bioinformatic analysis
Abstract
Abstract Background The possibility of recovering metagenome-assembled genomes (MAGs) from sequence reads allows for further insights into microbial communities and their members, possibly even analyzing such sequences with tools designed for single-isolate genomes. As result quality depends on sequence quality, performance of tools for single-isolate genomes on MAGs should be tested beforehand. Bioinformatics can be leveraged to quickly create varied synthetic test sets with known composition for this purpose. Results We present MAGICIAN, a flexible, user-friendly pipeline for the simulation of MAGs. MAGICIAN combines a synthetic metagenome simulator with a metagenomic assembly and binning pipeline to simulate MAGs based on user-supplied input genomes, allowing users to test performance of tools on MAGs while having a ground truth to compare results to. Using MAGICIAN, we found that even very slight (1%) changes in depth of coverage can drastically affect whether a genome can be recovered. We also demonstrate the use of simulated MAGs by evaluating the suitability of such genomes obtained with MAGICIAN’s current default pipeline for analysis with the antimicrobial resistance gene identification tool ResFinder. Conclusions Using MAGICIAN, it is possible to simulate MAGs which, while generally high in quality, reflect issues encountered with real-world data, thus providing realistic best-case data. Evaluating the results of ResFinder analysis of these genomes revealed a risk for plausible-looking false positives, which underlines the need for pipeline validation so that researchers are aware of the potential issues when interpreting real-world data. Furthermore, the effects of fluctuations in depth of coverage on genome recovery in our simulated “random sequencing” warrant further investigation and indicate random subsampling of reads may affect discovery of more genomes.
Keywords