mBio
(Nov 2016)
The Effects of Signal Erosion and Core Genome Reduction on the Identification of Diagnostic Markers
Jason W. Sahl,
Adam J. Vazquez,
Carina M. Hall,
Joseph D. Busch,
Apichai Tuanyok,
Mark Mayo,
James M. Schupp,
Madeline Lummis,
Talima Pearson,
Kenzie Shippy,
Rebecca E. Colman,
Christopher J. Allender,
Vanessa Theobald,
Derek S. Sarovich,
Erin P. Price,
Alex Hutcheson,
Jonas Korlach,
John J. LiPuma,
Jason Ladner,
Sean Lovett,
Galina Koroleva,
Gustavo Palacios,
Direk Limmathurotsakul,
Vanaporn Wuthiekanun,
Gumphol Wongsuwan,
Bart J. Currie,
Paul Keim,
David M. Wagner
Affiliations
Jason W. Sahl
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Adam J. Vazquez
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Carina M. Hall
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Joseph D. Busch
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Apichai Tuanyok
Emerging Pathogens Institute, University of Florida, Gainesville, Florida, USA
Mark Mayo
Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia
James M. Schupp
Translational Genomics Research Institute, Flagstaff, Arizona, USA
Madeline Lummis
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Talima Pearson
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Kenzie Shippy
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Rebecca E. Colman
Translational Genomics Research Institute, Flagstaff, Arizona, USA
Christopher J. Allender
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
Vanessa Theobald
Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia
Derek S. Sarovich
Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia
Erin P. Price
Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia
Alex Hutcheson
Pacific Biosciences, University of Michigan, Ann Arbor, Michigan, USA
Jonas Korlach
Pacific Biosciences, University of Michigan, Ann Arbor, Michigan, USA
John J. LiPuma
Division of Pediatric Infectious Diseases, University of Michigan, Ann Arbor, Michigan, USA
Jason Ladner
Center for Genome Sciences, USAMRIID, Fort Detrick, Maryland, USA
Sean Lovett
Center for Genome Sciences, USAMRIID, Fort Detrick, Maryland, USA
Galina Koroleva
Center for Genome Sciences, USAMRIID, Fort Detrick, Maryland, USA
Gustavo Palacios
Center for Genome Sciences, USAMRIID, Fort Detrick, Maryland, USA
Direk Limmathurotsakul
Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok, Thailand
Vanaporn Wuthiekanun
Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok, Thailand
Gumphol Wongsuwan
Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok, Thailand
Bart J. Currie
Global and Tropical Health Division, Menzies School of Health Research, Darwin, Northern Territory, Australia
Paul Keim
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
David M. Wagner
Center for Microbial Genetics and Genomics, Northern Arizona University, Flagstaff, Arizona, USA
DOI
https://doi.org/10.1128/mBio.00846-16
Journal volume & issue
Vol. 7,
no. 5
Abstract
Read online
ABSTRACT Whole-genome sequence (WGS) data are commonly used to design diagnostic targets for the identification of bacterial pathogens. To do this effectively, genomics databases must be comprehensive to identify the strict core genome that is specific to the target pathogen. As additional genomes are analyzed, the core genome size is reduced and there is erosion of the target-specific regions due to commonality with related species, potentially resulting in the identification of false positives and/or false negatives. IMPORTANCE A comparative analysis of 1,130 Burkholderia genomes identified unique markers for many named species, including the human pathogens B. pseudomallei and B. mallei. Due to core genome reduction and signature erosion, only 38 targets specific to B. pseudomallei/mallei were identified. By using only public genomes, a larger number of markers were identified, due to undersampling, and this larger number represents the potential for false positives. This analysis has implications for the design of diagnostics for other species where the genomic space of the target and/or closely related species is not well defined.
WeChat QR code
Close