Genome Biology (Feb 2024)

Rapid and sensitive detection of genome contamination at scale with FCS-GX

  • Alexander Astashyn,
  • Eric S. Tvedte,
  • Deacon Sweeney,
  • Victor Sapojnikov,
  • Nathan Bouk,
  • Victor Joukov,
  • Eyal Mozes,
  • Pooja K. Strope,
  • Pape M. Sylla,
  • Lukas Wagner,
  • Shelby L. Bidwell,
  • Larissa C. Brown,
  • Karen Clark,
  • Emily W. Davis,
  • Brian Smith-White,
  • Wratko Hlavina,
  • Kim D. Pruitt,
  • Valerie A. Schneider,
  • Terence D. Murphy

DOI
https://doi.org/10.1186/s13059-024-03198-7
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 25

Abstract

Read online

Abstract Assembled genome sequences are being generated at an exponential rate. Here we present FCS-GX, part of NCBI’s Foreign Contamination Screen (FCS) tool suite, optimized to identify and remove contaminant sequences in new genomes. FCS-GX screens most genomes in 0.1–10 min. Testing FCS-GX on artificially fragmented genomes demonstrates high sensitivity and specificity for diverse contaminant species. We used FCS-GX to screen 1.6 million GenBank assemblies and identified 36.8 Gbp of contamination, comprising 0.16% of total bases, with half from 161 assemblies. We updated assemblies in NCBI RefSeq to reduce detected contamination to 0.01% of bases. FCS-GX is available at https://github.com/ncbi/fcs/ or https://doi.org/10.5281/zenodo.10651084 .

Keywords