Genome Biology (Feb 2024)
Rapid and sensitive detection of genome contamination at scale with FCS-GX
- Alexander Astashyn,
- Eric S. Tvedte,
- Deacon Sweeney,
- Victor Sapojnikov,
- Nathan Bouk,
- Victor Joukov,
- Eyal Mozes,
- Pooja K. Strope,
- Pape M. Sylla,
- Lukas Wagner,
- Shelby L. Bidwell,
- Larissa C. Brown,
- Karen Clark,
- Emily W. Davis,
- Brian Smith-White,
- Wratko Hlavina,
- Kim D. Pruitt,
- Valerie A. Schneider,
- Terence D. Murphy
Affiliations
- Alexander Astashyn
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Eric S. Tvedte
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Deacon Sweeney
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Victor Sapojnikov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Victor Joukov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Eyal Mozes
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Pooja K. Strope
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Pape M. Sylla
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Lukas Wagner
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Shelby L. Bidwell
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Larissa C. Brown
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Karen Clark
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Emily W. Davis
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Brian Smith-White
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Wratko Hlavina
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Kim D. Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Valerie A. Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- Terence D. Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health
- DOI
- https://doi.org/10.1186/s13059-024-03198-7
- Journal volume & issue
-
Vol. 25,
no. 1
pp. 1 – 25
Abstract
Abstract Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI’s Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1–10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .
Keywords